With GridDB Community Edition 4.1, online node addition and removal has been added; this feature was only previously available in the commercial standard edition. This first blog, of a two part series, will showcase how to setup a three-node cluster on public cloud infrastructure. The second post will showcase the process of recovering from a failure as well as adding and removing nodes.
We’ll assume you have deployed three Centos 7 instances on the same vnet, griddb1 (192.168.1.10), griddb2 (192.168.1.11), and griddb3 (192.168.1.12) so that we can follow the following steps, running each step on each node before moving on to the next step.
#1 Install GridDB
$ sudo rpm -Uvh https://github.com/griddb/griddb_nosql/releases/download/v4.1.1/griddb_nosql-4.1.1-1.linux.x86_64.rpm
#2 Create and deploy Configuration Files
The first step is to disable iptables
and firewalld
. In a production environment you should create rules that allow all of the GridDB ports to run.
$ sudo su - # systemctl disable firewalld # systemctl stop firewalld # iptables -F INPUT
All GridDB operations should be performed as the gsadm
user.
$ sudo su - gsadm
gs_cluster.json specifies the name of the cluster, the definition of the nodes in the cluster, and the replication number. With replicationNum: 2
, two nodes will contain each piece of data stored in GridDB.
$ cat > /var/lib/gridstore/conf/gs_cluster.json << EOF { "dataStore":{ "partitionNum":128, "storeBlockSize":"64KB" }, "cluster":{ "clusterName":"defaultCluster", "replicationNum":2, "notificationInterval":"5s", "heartbeatInterval":"5s", "loadbalanceCheckInterval":"180s", "notificationMember": [ { "cluster": {"address":"192.168.1.10", "port":10010}, "sync": {"address":"192.168.1.10", "port":10020}, "system": {"address":"192.168.1.10", "port":10080}, "transaction": {"address":"192.168.1.10", "port":10001}, "sql": {"address":"192.168.1.10", "port":20001} }, { "cluster": {"address":"192.168.1.11", "port":10010}, "sync": {"address":"192.168.1.11", "port":10020}, "system": {"address":"192.168.1.11", "port":10080}, "transaction": {"address":"192.168.1.11", "port":10001}, "sql": {"address":"192.168.1.11", "port":20001} }, { "cluster": {"address":"192.168.1.12", "port":10010}, "sync": {"address":"192.168.1.12", "port":10020}, "system": {"address":"192.168.1.12", "port":10040}, "transaction": {"address":"192.168.1.12", "port":10001}, "sql": {"address":"192.168.1.12", "port":20001} } ] }, "sync":{ "timeoutInterval":"30s" } } EOF
In gs_node.json
, you set how much memory and the number of threads to use. The following values are a good starting point for a node with 8 cores and 8GB of RAM.
$ cat > /var/lib/gridstore/conf/gs_node.json << EOF { "dataStore":{ "dbPath":"data", "backupPath":"backup", "storeMemoryLimit":"5120MB", "storeWarmStart":true, "concurrency":8, "logWriteMode":1, "persistencyMode":"NORMAL", "affinityGroupSize":4 }, "checkpoint":{ "checkpointInterval":"1200s", "checkpointMemoryLimit":"1024MB", "useParallelMode":false }, "cluster":{ "servicePort":10010 }, "sync":{ "servicePort":10020 }, "system":{ "servicePort":10040, "eventLogPath":"log" }, "transaction":{ "servicePort":10001, "connectionLimit":5000 }, "trace":{ "default":"LEVEL_ERROR", "dataStore":"LEVEL_ERROR", "collection":"LEVEL_ERROR", "timeSeries":"LEVEL_ERROR", "chunkManager":"LEVEL_ERROR", "objectManager":"LEVEL_ERROR", "checkpointFile":"LEVEL_ERROR", "checkpointService":"LEVEL_INFO", "logManager":"LEVEL_WARNING", "clusterService":"LEVEL_ERROR", "syncService":"LEVEL_ERROR", "systemService":"LEVEL_INFO", "transactionManager":"LEVEL_ERROR", "transactionService":"LEVEL_ERROR", "transactionTimeout":"LEVEL_WARNING", "triggerService":"LEVEL_ERROR", "sessionTimeout":"LEVEL_WARNING", "replicationTimeout":"LEVEL_WARNING", "recoveryManager":"LEVEL_INFO", "eventEngine":"LEVEL_WARNING", "clusterOperation":"LEVEL_INFO", "ioMonitor":"LEVEL_WARNING" } } EOF
Finally, set the password for the admin user. Use something more secure if not just testing!
$ gs_passwd -u admin -p admin
#3 Start GridDB
$ sudo su - gsadm $ gs_startnode $ gs_joincluster -u admin/admin -n 3
#4 Finished
Finally, run gs_stat
to see if the cluster is running correctly. If a complete node list is not shown as below, move to the master node and run gs_stat
there. If gs_stat
shows SUB_CLUSTER, the cluster is _NOT_ running correctly, and is likely having communication issues between nodes.
$ sudo su - gsadm $ gs_stat -u admin/admin ... snip ... "cluster": { "activeCount": 3, "clusterName": "defaultCluster", "clusterStatus": "MASTER", "designatedCount": 3, "loadBalancer": "ACTIVE", "master": { "address": "192.168.1.10", "port": 10040 }, "nodeList": [ { "address": "192.168.1.10", "port": 10040 }, { "address": "192.168.1.11", "port": 10080 }, { "address": "192.168.1.12", "port": 10080 } ], "nodeStatus": "ACTIVE", "notificationMode": "FIXED_LIST", "partitionStatus": "NORMAL", "startupTime": "2019-03-15T04:57:16Z", "syncCount": 173 }, ... snip ...
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.