Back in April 2017, Toshiba started offering GridDB SE Amazon Machine Images in the AWS Marketplace. These images allow system administrators and developers to quickly set up GridDB in the cloud. In this blog post we’ll look at how to effectively deploy a small cluster. We’ll create an 8 node cluster, with 5 application servers and 3 GridDB nodes and then run a YCSB to demonstrate it working. Launching the Instances If you’re new to AWS, after creating and verifying your account, you’ll need to either upload an SSH public key or create one via the EC2 console. You must upload/create the key pair in the region you intend to launch the VMs. Now we can head to the market place for the AMIs of both GridDB SE and CentOS 7 that we will be using for the application servers. We use the CentOS 7 AMI instead of the RedHat or Amazon Linux AMI because GridDB AMIs are based off CentOS 7 and having the same base platform may make deploying the solution easier. GridDB:
https://aws.amazon.com/marketplace/pp/B01N9QMCMF CentOS 7: https://aws.amazon.com/marketplace/pp/B00O7WM7QW What instance type to use is going to be very application specific. We know from past experience that YCSB is very CPU-intensive, but not very memory-intensive. This makes the C3/C4 instances an ideal fit for the application servers. We selected C4 as it has newer processors and faster networking for basically the same cost as C3. Since GridDB will keep as much of the database in-memory as it can, the R3/R4 instances are more ideal as they contain more memory. R3 includes some SSD storage, but the included sizes are too small as we know from the estimating blog post that you need at approximately 3x as much disk storage as memory for an in-memory database and even more disk storage if the database will grow past memory bounds, so we’ll create an R4 instance and they can create additional SSD EBS volume later. From their Marketplace pages, we’ll create three GridDB instances and five CentOS 7 instances with our selected instance types using one click launch. Now we need to create the SSD volumes for the GridDB Servers. From the EC2 console, go to Volumes and then Create New Volume. Once again, checkout the Estimating blog post to determine how large of a volume you’ll need for your solution. A GP2 volume is a good mix of price and performance and would be recommended for most applications. The availability zone is a very important configuration option: it needs to be the exact same zone as your instances. If your instances are in the us-east-1e zone and you use the default selection of us-east-1a for the Volume zone, you will not be able to attach the volume to your instance. Once you’ve created the volume, you can then attach it to an instance by selecting and then going to Actions -> Attach Volume.
Configuring The Instances After your instances have launched, it’s a good idea to name them all, for example ycsb1-5 and gsserver1-3. We’ll only manage the other nodes from ycsb1 using ssh, so also upload your private key to it:
$ scp ~/.ssh/id_rsa ycsb1_public_ip:.ssh/
We’ll use the local IPs for all services, the external IPs are dynamic and will change on reboot. If your application requires consistent external access, you can assign Elastic IPs as required. One annoying thing about AWS is the local IP determination it’s very random. Rather than having IPs like 172.16.1.10,11,12,13,15 it ends up being more like 172.31.63.83, 172.31.61.5, 172.31.57.184. To get around this, make three files:
all_ips: Contains a newline separated ll the instance local IPs. ycsb_ips: Contains a newline separated list of the application server local IPs. griddb_ips: Contains a new line separated list of the GridDB server local IPs. To ease management of the small cluster, we’ll create two scripts to copy files and run programs on our nodes. Using Puppet or another configuration management system is highly recommended in larger clusters. runcmd.sh takes as input one of the file above plus a string of command arguments to run.
#!/bin/bash
for x in `cat $1`; do ssh -t centos@$x $2; done
cpall.sh takes as input one of the file above plus a path to copy:
#!/bin/bash
for x in `cat $1`; do scp -r $2 $x:$3; done
Starting the GridDB Servers First we need to partition the attached SSD EBS volume we created above:
$ sudo fdisk /dev/xvdf
Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (63-83886079, default 63):
Using default value 63
Last sector, +sectors or +size{K,M,G} (63-83886079, default 83886079):
Using default value 83886079
Partition 1 of type Linux and of size 40 GiB is set
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
Then format the newly created partition:
$ sudo mkfs.ext4 /dev/xvdf1 mke2fs 1.42.9 (28-Dec-2013) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 2621440 inodes, 10485752 blocks 524287 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=2157969408 320 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done
There are several pieces of documentation about how to setup GridDB but most of those guides use Community Edition on a private network that supports multicast. Most cloud services including AWS do not support multicast between instances so GridDB must be configured using a fixed list. Now edit /etc/fstab with your favorite editor, adding the new volume as follows:
/dev/xvdf1 /var/lib/gridstore/data ext4 defaults 0 0
You can then mount the new partition and ensure the permissions are correct:
$ sudo mount /var/lib/gridstore/data $ sudo chown gsadm.gridstore /var/lib/gridstore/data
It’ll automatically be mounted on subsequent reboots. Now you can configure and start GridDB itself. It is best to perform all GridDB configuration as the gsadm user to prevent any possible permission issues in the future. You can switch to the gsadm user with the following command:
$ sudo su - gsadm
Edit /var/lib/gridstore/conf/gs_cluster.json to look like the following replacing $GRIDDB_1, $GRIDDB_2, and $GRIDDB_3 with the correct IP addresses on all three GridDB servers.
{
"dataStore":{
"partitionNum":128,
"storeBlockSize":"64KB"
},
"cluster":{
"clusterName":"AWSCluster",
"replicationNum":2,
"heartbeatInterval":"5s",
"loadbalanceCheckInterval":"180s",
"notificationMember": [
{
"cluster": {"address":"$GRIDDB_1", "port":10010},
"sync": {"address":"$GRIDDB_1", "port":10020},
"system": {"address":"$GRIDDB_1", "port":10040},
"transaction": {"address":"$GRIDDB_1", "port":10001},
},
{
"cluster": {"address":"$GRIDDB_2", "port":10010},
"sync": {"address":"$GRIDDB_2", "port":10020},
"system": {"address":"$GRIDDB_2", "port":10040},
"transaction": {"address":"$GRIDDB_2", "port":10001},
},
{
"cluster": {"address":"$GRIDDB_3", "port":10010},
"sync": {"address":"$GRIDDB_3", "port":10020},
"system": {"address":"$GRIDDB_3", "port":10040},
"transaction": {"address":"$GRIDDB_3", "port":10001},
}
]
},
"sync":{
"timeoutInterval":"30s"
}
}
For optimum performance of, edit the
storeMemoryLimit field to 10240MB and the concurrency field to 2 in the gs_node.json file. Now set the password for admin user on all three GridDB servers:
$ gs_passwd Password: Retype password: $ exit
Finally, edit /etc/sysconfig/gridstore/gridstore.conf changing CLUSTER_NAME=INPUT_YOUR_CLUSTER_NAME_HERE to CLUSTER_NAME=AWSCluster and set MIN_NODE_NUM=3. Now you can start server process on the three GridDB servers:
$ sudo systemctl enable gridstore $ sudo systemctl start gridstore
If the start succeeds on all three GridDB servers, you can confirm the operation with the gs_stat command. Running it on the master will return a list of all nodes, but running gs_stat on a follower will include only itself and the master:
"cluster": {
"clusterName": "AWSCluster",
"clusterStatus": "FOLLOWER",
"designatedCount": 3,
"loadBalancer": "ACTIVE",
"master": {
"address": "172.31.57.21",
"port": 10040
},
"nodeList": [
{
"address": "172.31.57.184",
"port": 10040
},
{
"address": "172.31.57.21",
"port": 10040
}
],
"nodeStatus": "ACTIVE",
"notificationMode": "FIXED_LIST",
"partitionStatus": "NORMAL",
"startupTime": "2017-07-19T00:31:12Z",
"syncCount": 6
}
Application Server Configuration Now it’s time to set up the application servers, unless specified the following needs to be run on each individual node. First, we need to install Java x64 RPM which can be downloaded from
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html.
$ sudo rpm -Uvh jdk-8u121-linux-x64.rpm
The GridDB client is located on the GridDB SE AMI instance, please note that the GridDB SE and CE clients are not compatible you can’t use a CE client with SE server and vice-versa. You
*can** copy the RPM files from one of the GridDB server, but their permissions are owned by root:
$ ssh -t $GRIDDB_1 sudo chmod 755 /usr/gridstore/rpm $ ssh -t $GRIDDB_1 sudo chmod 644 /usr/gridstore/rpm/griddb-se-java_lib-3.1.0-linux.x86_64.rpm.rpm
Now we can copy and install the RPMS on all of the applications:
$ scp -r $GRIDDB_1:/usr/gridstore/rpm/griddb-se-java_lib-3.1.0-linux.x86_64.rpm ~ $ sudo rpm -Uvh ~/griddb-se-java_lib-3.1.0-linux.x86_64.rpm
Finally, we’ll edit .bashrc to set our CLASSPATH variable:
export CLASSPATH=:/usr/share/java/gridstore.jar
Setting Up YCSB We will follow the Getting Started with YCSB Blog Post but make a few changes to account for the FIXED_LIST notification mechanism. We’ll use the following YCSB options for all of our YCSB commands:
YCSB_OPTIONS=" -p notificationMember=172.31.57.184:10001,172.31.61.151:10001,172.31.57.21:10001 -p clusterName=AWSCluster -p userName=admin -p password=admin -cp /usr/share/java/gridstore.jar -p fieldlength=100 -p fieldcount=10"
Then run a load workload to make sure everything is working:
$ ./bin/ycsb load griddb -P workloads/workloada $YCSB_OPTIONS
Then a transactional workload:
$ ./bin/ycsb run griddb -P workloads/workloada $YCSB_OPTIONS
Once everything is working on one application server node, copy the YCSB directory to the other client nodes and perform testing. If everything works, you’re ready to move on to the multi-node test.
Multi-Node YCSB The following script executes five YCSB clients that load 10M rows into GridDB and then runs Workload A, a simple read-write workload. Like the above scripts, you need to replace $GRIDDB_1,2,3 and $YCSB_1,2,3,4,5 with the correct IP addresses.
#!/bin/bash
COUNT=10000000
OPCOUNT=1000000
LOADSIZE=`expr $COUNT / 5`
YCSB_OPTIONS="-p notificationMember=$GRIDDB_1:10001,$GRIDDB_2:10001,$GRIDDB_3:10001 -p clusterName=AWSCluster -p userName=admin -p password=admin -cp /usr/share/java/gridstore.jar -p fieldlength=100 -p fieldcount=10"
echo Loading...
./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p recordcount=${COUNT} -p insertstart=`expr $LOADSIZE * 0` -p insertcount=${LOADSIZE} -s &
pids="$! "
ssh $YCSB_2 ./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p recordcount=${COUNT} -p insertstart=`expr $LOADSIZE * 1` -p insertcount=${LOADSIZE} -s &
pids="$! $pids"
ssh $YCSB_3 ./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p recordcount=${COUNT} -p insertstart=`expr $LOADSIZE * 2` -p insertcount=${LOADSIZE} -s &
pids="$! $pids"
ssh $YCSB_4 ./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p recordcount=${COUNT} -p insertstart=`expr $LOADSIZE * 3` -p insertcount=${LOADSIZE} -s &
pids="$! $pids"
ssh $YCSB_5 ./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p recordcount=${COUNT} -p insertstart=`expr $LOADSIZE * 4` -p insertcount=${LOADSIZE} -s &
pids="$! $pids"
wait $pids
echo
echo Running...
./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p operationcount=${OPCOUNT} -p recordcount=${COUNT} -s
pids="$! "
ssh $YCSB_2 ./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p operationcount=${OPCOUNT} -p recordcount=${COUNT} -s
pids="$! $pids"
ssh $YCSB_3 ./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p operationcount=${OPCOUNT} -p recordcount=${COUNT} -s
pids="$! $pids"
ssh $YCSB_4 ./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p operationcount=${OPCOUNT} -p recordcount=${COUNT} -s
pids="$! $pids"
ssh $YCSB_5 ./ycsb-0.10.0/bin/ycsb load griddb -P ycsb-0.10.0/workloads/workloada $YCSB_OPTIONS -p operationcount=${OPCOUNT} -p recordcount=${COUNT} -s
pids="$! $pids"
wait $pids
Congratulations, you’re all done. All of the above can be tweaked to find your ideal configuration.*
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.
