This document explains how to build a multi-node cluster on Microsoft Azure for the testing, development and use of GridDB NoSQL software. It assumes the end user has general knowledge of Cloud Computing Concepts, Linux System Administration, and use of GridDB.
Install Azure Tools
For this document, we will assume Windows 8.1 or 10 is being used to control the Azure cluster.
First download and install the Azure PowerShell CmdLets, the Azure Command Line Interface and AzCopy from. https://azure.microsoft.com/en-us/downloads/
Create The Base Image
The first step is to create a disk image used for all 100 nodes in the cluster. First login to portal.azure.com and create the resource group, BASEGRP by clicking on Resource Groups and then and . Then create a Virtual Machine BASEVM by clicking Resource Groups, then BASEGRP, and then . Search for Centos 6.7 and select the OpenLogics image. Choose your desired Basic Settings and Size and in the Settings pane click on Storage Account and Create New picking a unique name such as BASESTOR. Also on the Settings pane click Public IP address and Create New. We can use a dynamic IP as we will create a new IP address resource for the final cluster’s headnode. Now you can login to host via ssh using a tool such as Putty or MINGW/ssh using the IP address listed and the user account created. Now we need to create the users that can login to the nodes. This is simple with adduser.
# sudo adduser MYUSER # sudo passwd MYUSER
For each user you create we need to create SSH keys so we can login to other nodes in the cluster without a password.
# sudo su - MYUSER $ ssh-keygen -t rsa $ cp .ssh/id_rsa.pub .ssh/authorized_keys $ exit
We should also do this for the root user:
# ssh-keygen -t rsa # cp .ssh/id_rsa.pub .ssh/authorized_keys
If other users should be able to sudo to root, add this line to /etc/sudoers
MYUSER ALL=(ALL)- NOPASSWD: ALL
Now, we’re ready to start installing GridDB per the instructions in the
GridDB Community Edition RPM Install Guide. If you’re using GridDB SE or AE, the details may be slightly different. Please refer to the GridDB SE/AE Quick Start for more information.
# rpm -Uvh https://github.com/griddb/griddb_nosql/releases/download/v3.0.0/griddb_nosql-3.0.0-1.linux.x86_64.rpm
We’re going to use a profile script to set the GridDB environment variables, as root create /etc/profile.d/griddb_nosql.sh:
#!/bin/bash export GS_HOME=/var/lib/gridstore export GS_LOG=$GS_HOME/log
You can log out and then back in and the settings will be applied. GridDB has two configuration files which we have created for a 24 node cluster and are included in the ZIP file at the bottom of this page. Copy gs_cluster.json and gs_node.json to their correct location in in $GS_HOME/conf
# cp GridDBAzureFiles/gs_*.json /var/lib/gridstore/conf
Now you create the GridDB password for admin.
$ bin/gs_passwd admin (input your_password)
Then de-provision the the node so we can capture the image.
$ sudo waagent --deprovision
Then stop and delete the VM from the Azure portal by clicking on Virtual Machines, BASEVM, Delete, and following the instructions.
Deploy The Cluster
Prepare the Azure environment for every cmd.exe window you use:
> azure login > azure account set MY_SUBSCRIPTION
To do a mass deployment with azure, you need two files that define your nodes and their parameters; nodes.json and params.json respectively. These files have been included in a zipfile at the bottom of this page. Editing nodes.json is only required if you wish to change the size of the YCSB cluster as it currently contains 24 nodes. Simply delete or add node definitions within the file as necessary. You will need to edit the params.json file to contain the correct values. storageAccountName is set to the same storage account used for the original node and
sourceVhdFilename is set to just the filename used for the original node. These values can be found in the Azure portal by going to the node, then Disks, the entry under OS Disks and parsing the URL. The first sub-domain is the storage account, while the actual filename becomes sourceVhdFilename. For example, with a URL such as https://griddbtestdisks.blob.core.windows.net/vhds/griddbtest120161220.vhd, the storageAccountName is griddbtestdisks and the sourceVhdFilename is griddbtest120161220.vhd. Now, we can deploy the nodes.
> azure group deployment create -f nodes.json -e params.json -g CLUSTER
This will take some time, once the nodes are deployed they will be running and billing.
Since we need to be access the cluster, so we need to add a public IP to node1. In the Azure Portal, go to Virtual machines, node1, Network Interfaces, nic1, “IP Configurations”, “ipconfig1”, Public IP address Enabled, IP Address, Create New, enter a desired name as CLUSTERIP, select Static, Okay, and finally Save on the nic1 pane.
Starting GridDB on the Cluster
Now we can SSH to the IP Address we just created and start working with the Azure VMs. The first thing we want to use is a small script that enables us to run commands on all VMs at the same time. Create runcmd.sh in your favorite editor:
#!/bin/bash for x in `seq 4 27`; do sudo ssh -o StrictHostKeyChecking=no 10.0.0.$x "$@" & pids[$x]=$! done for x in `seq 4 27`; do wait ${pids[$x]} done
Make it executable:
$ chmod +x runcmd.sh
Now we can run commands on all of the hosts with one simple command:
$ ./runcmd.sh hostname node1 node2 node3 €¦ node23 node24
The next step is to start all of the GridDB servers on all of the nodes:
runcmd su -c gs_startnode gsadm sleep 2; runcmd su -c "gs_joincluster -c defaultCluster -u admin/admin -n 24" gsadm
You can check to see if the cluster is running with the gs_stat tool.
# gs_stat -u admin/your_password { "checkpoint": { "archiveLog": 0, "backupOperation": 0, "duplicateLog": 0, "endTime": 1484681438011, "mode": "RECOVERY_CHECKPOINT", "normalCheckpointOperation": 0, "pendingPartition": 0, "requestedCheckpointOperation": 0, "startTime": 1484681437974 }, "cluster": { "clusterName": "defaultCluster", "clusterStatus": "FOLLOWER", "designatedCount": 5, "loadBalancer": "ACTIVE", "master": { "address": "10.3.0.5", "port": 10040 }, "nodeList": [ { "address": "10.3.0.6", "port": 10040 }, { "address": "10.3.0.5", "port": 10040 } ], "nodeStatus": "ACTIVE", "notificationMode": "FIXED_LIST", "partitionStatus": "NORMAL", "startupTime": "2017-01-17T19:30:36Z", "syncCount": 46 }, "currentTime": "2017-01-17T19:40:17Z", "performance": { "batchFree": 0, "checkpointFileSize": 65536, ... snip ... "totalRowWrite": 451987, "totalWriteOperation": 451987 }, "recovery": { "progressRate": 1 }, "version": "3.0.0 CE" }
Using gs_stat with the -s with the IP address of the cluster’s master node (which is automatically determined with a bully algorithm) will give you further details. Now check out how to use the
YCSB GridDB connector to benchmark your new cluster. Some of the files used in the post have been made available in a zip file you can download here.
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.