Building a GridDB Azure Cluster

This document explains how to build a multi-node cluster on Microsoft Azure for the testing, development and use of GridDB NoSQL software. It assumes the end user has general knowledge of Cloud Computing Concepts, Linux System Administration, and use of GridDB.

Install Azure Tools

For this document, we will assume Windows 8.1 or 10 is being used to control the Azure cluster.

First download and install the Azure PowerShell CmdLets, the Azure Command Line Interface and AzCopy from. https://azure.microsoft.com/en-us/downloads/

Create The Base Image

The first step is to create a disk image used for all 100 nodes in the cluster. First login to portal.azure.com and create the resource group, BASEGRP by clicking on Resource Groups and then and . Then create a Virtual Machine BASEVM by clicking Resource Groups, then BASEGRP, and then . Search for Centos 6.7 and select the OpenLogics image. Choose your desired Basic Settings and Size and in the Settings pane click on Storage Account and Create New picking a unique name such as BASESTOR. Also on the Settings pane click Public IP address and Create New. We can use a dynamic IP as we will create a new IP address resource for the final cluster’s headnode. Now you can login to host via ssh using a tool such as Putty or MINGW/ssh using the IP address listed and the user account created. Now we need to create the users that can login to the nodes. This is simple with adduser.

# sudo adduser MYUSER
# sudo passwd MYUSER

For each user you create we need to create SSH keys so we can login to other nodes in the cluster without a password.

# sudo su - MYUSER
$ ssh-keygen -t rsa
$ cp .ssh/id_rsa.pub .ssh/authorized_keys
$ exit

We should also do this for the root user:

# ssh-keygen -t rsa
# cp .ssh/id_rsa.pub .ssh/authorized_keys

If other users should be able to sudo to root, add this line to /etc/sudoers

MYUSER ALL=(ALL)-  NOPASSWD: ALL

Now, we’re ready to start installing GridDB per the instructions in the

GridDB Community Edition RPM Install Guide. If you’re using GridDB SE or AE, the details may be slightly different. Please refer to the GridDB SE/AE Quick Start for more information.

# rpm -Uvh https://github.com/griddb/griddb_nosql/releases/download/v3.0.0/griddb_nosql-3.0.0-1.linux.x86_64.rpm

We’re going to use a profile script to set the GridDB environment variables, as root create /etc/profile.d/griddb_nosql.sh:

#!/bin/bash
export GS_HOME=/var/lib/gridstore
export GS_LOG=$GS_HOME/log

You can log out and then back in and the settings will be applied. GridDB has two configuration files which we have created for a 24 node cluster and are included in the ZIP file at the bottom of this page. Copy gs_cluster.json and gs_node.json to their correct location in in $GS_HOME/conf

# cp GridDBAzureFiles/gs_*.json /var/lib/gridstore/conf

Now you create the GridDB password for admin.

$ bin/gs_passwd admin (input your_password)

Then de-provision the the node so we can capture the image.

$ sudo waagent --deprovision

Then stop and delete the VM from the Azure portal by clicking on Virtual Machines, BASEVM, Delete, and following the instructions.

Deploy The Cluster

Prepare the Azure environment for every cmd.exe window you use:

> azure login
> azure account set MY_SUBSCRIPTION

To do a mass deployment with azure, you need two files that define your nodes and their parameters; nodes.json and params.json respectively. These files have been included in a zipfile at the bottom of this page. Editing nodes.json is only required if you wish to change the size of the YCSB cluster as it currently contains 24 nodes. Simply delete or add node definitions within the file as necessary. You will need to edit the params.json file to contain the correct values. storageAccountName is set to the same storage account used for the original node and 

sourceVhdFilename is set to just the filename used for the original node. These values can be found in the Azure portal by going to the node, then Disks, the entry under OS Disks and parsing the URL. The first sub-domain is the storage account, while the actual filename becomes sourceVhdFilename. For example, with a URL such as https://griddbtestdisks.blob.core.windows.net/vhds/griddbtest120161220.vhd, the storageAccountName is griddbtestdisks and the sourceVhdFilename is griddbtest120161220.vhd.  Now, we can deploy the nodes.

> azure group deployment create -f nodes.json -e params.json -g CLUSTER

This will take some time, once the nodes are deployed they will be running and billing.

Since we need to be access the cluster, so we need to add a public IP to node1. In the Azure Portal, go to Virtual machines, node1, Network Interfaces, nic1, “IP Configurations”, “ipconfig1”, Public IP address Enabled, IP Address, Create New, enter a desired name as CLUSTERIP, select Static, Okay, and finally Save on the nic1 pane.

Starting GridDB on the Cluster

Now we can SSH to the IP Address we just created and start working with the Azure VMs. The first thing we want to use is a small script that enables us to run commands on all VMs at the same time. Create runcmd.sh in your favorite editor:

#!/bin/bash

for x in `seq 4 27`; do
    sudo ssh  -o StrictHostKeyChecking=no 10.0.0.$x "$@" &
    pids[$x]=$!
done
for x in `seq 4 27`; do
    wait ${pids[$x]}
done

Make it executable:

$ chmod +x runcmd.sh

Now we can run commands on all of the hosts with one simple command:

$ ./runcmd.sh hostname
 node1
 node2
 node3
 €¦
 node23
 node24

The next step is to start all of the GridDB servers on all of the nodes:

runcmd su -c gs_startnode gsadm
sleep 2;
runcmd su -c "gs_joincluster -c defaultCluster -u admin/admin -n 24" gsadm

You can check to see if the cluster is running with the gs_stat tool.

# gs_stat -u admin/your_password
{
    "checkpoint": {
        "archiveLog": 0,
        "backupOperation": 0,
        "duplicateLog": 0,
        "endTime": 1484681438011,
        "mode": "RECOVERY_CHECKPOINT",
        "normalCheckpointOperation": 0,
        "pendingPartition": 0,
        "requestedCheckpointOperation": 0,
        "startTime": 1484681437974
    },
    "cluster": {
        "clusterName": "defaultCluster",
        "clusterStatus": "FOLLOWER",
        "designatedCount": 5,
        "loadBalancer": "ACTIVE",
        "master": {
            "address": "10.3.0.5",
            "port": 10040
        },
        "nodeList": [
            {
                "address": "10.3.0.6",
                "port": 10040
            },
            {
                "address": "10.3.0.5",
                "port": 10040
            }
        ],
        "nodeStatus": "ACTIVE",
        "notificationMode": "FIXED_LIST",
        "partitionStatus": "NORMAL",
        "startupTime": "2017-01-17T19:30:36Z",
        "syncCount": 46
    },
    "currentTime": "2017-01-17T19:40:17Z",
    "performance": {
        "batchFree": 0,
        "checkpointFileSize": 65536,
... snip ...
        "totalRowWrite": 451987,
        "totalWriteOperation": 451987
    },
    "recovery": {
        "progressRate": 1
    },
    "version": "3.0.0 CE"
}

Using gs_stat with the -s with the IP address of the cluster’s master node (which is automatically determined with a bully algorithm) will give you further details. Now check out how to use the

YCSB GridDB connector to benchmark your new cluster. Some of the files used in the post have been made available in a zip file you can download here.

If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.