GridDB is an innovative time series database solution for companies and businesses aiming to store Big Data structures for high-performance applications. This database allows some of the most desirable attributes most innovative companies desire. First, GridDB offers high-performance thanks to its key-value pair setup. Second, GridDB provides high scalability to serve more customers or clients depending on the use case and business need. GridDB offers high availability, allowing for a persistent control cluster infrastructure that increases fault tolerance. This article will cover the main techniques that enable fine-tuning your GridDB database. Fine-tuning your database is a technique that enables your DB’s optimization to operate at its total capacity. This technique allows you to save on resources yet still high performance. As a starting point, we will set up an environment to interact with the GridDB architecture. Next, we will set up the fine-tuning paraments and explain their use. Last, we will use these parameters to upgrade performance and availability.
Setting up Your Environment
To successfully fine-tune the operations performed in your GridDB database, we are required to install and configure a list of tools and dependencies before we start our process:
- Anaconda – Jupyter
- Python 3.8 – GridDB Python Client 0.8.3
- Swig 4.0.2 – Swig Installation
We will use the command-line interface (CLI) to perform our fine-tuning process. In other words, you don’t need a specific operating system to perform these changes and configurations. In terms of edits, a JSON (JavaScript Object Notation) file reader can be used to write the configuration file for fine-tuning.
Setting the Tuning Parameters
The areas of focus that will target in our tuning process are performance, availability, and reliability. These parameters encapsulate most of the attributes a database administrator looks for while choosing a database cluster. GridDB offers direct access to these fine-tuning parameters using the node definition file that can be altered by editing the value of the GridDB configuration objects. The attributes that can be fine-tuned are as follows:
/dataStore/persistencyMode
: A string parameter used to set up the persistency mode./dataStore/logWriteMode
: An integer parameter used to set up the log write mode./transaction/replicationMode
: An integer parameter used to set up replication mode./dataStore/storeWarmStart
: A boolean parameter used to set up the start processing mode./dataStore/dbPath
: A string parameter used to set up the database file directory./dataStore/backupPath
: A string parameter used to set up the backup file directory./dataStore/storeMemoryLimit
: A string parameter used to set up the memory buffer size./dataStore/concurrency
: An integer parameter used to set up the processing parallelism mode./dataStore/affinityGroupSize
: An integer parameter that sets up the number of data affinity groups./checkpoint/checkpointInterval
: An integer parameter that sets up the checkpoint execution interval in seconds./system/eventLogPath
: A string parameter used to set up the output directory of the event log file./transaction/connectionLimit
: An integer parameter that sets up the number of connections upper limit value./trace/category
: A string parameter used to set up the event log output level.
To fine-tune your GridDB database, we need to edit the GridDB configuration file. This task can be done by opening the file using a file editor and then changing the values depending on the options explained in the following sections.
Update Performance
Before updating the performance of our GridDB database, we need to start our database using the gsadm
command. This command opens the door to editing the configuration file of your DB using a GridDB user. This task is achieved with the following command:
sudo su - gsadm
You will need to use a file editor to edit your configuration file. In our example, we will use the vim
text editor. However, it is essential to locate the file using the conf directory. This task is achieved with the following command:
vim conf/gs_cluster.json
If your objective is to delay the synchronization process of your database, then the logWriteMode
value should be changed to 1
. In addition, this command goes perfectly with the persistencyMode
set to a NORMAL
. This value updates your data based on the delayed timeline you set. In other words, your data is only updated based on a predefined time you select and configure. This option enables you to use your resources efficiently and focus your computational power on other application areas. This task is achieved with the following:
"dataStore":{
"logWriteMode":1,
"persistencyMode":"NORMAL",
}
If your objective is to perform the synchronization process consistently, then the logWriteMode
value should be changed to 0
. In addition, this command goes perfectly with the persistencyMode
set to a KEEP_ALL_LOGS
. This value saves all your transactions in a local file that stores every operation executed in your database. This task is memory intensive but can help you keep track of your database operations. This task is achieved with the following:
"dataStore":{
"logWriteMode":0,
"persistencyMode":"KEEP_ALL_LOGS",
}
Performance and Availability
GridDB is known for being high performance and high-availability database. This feature is resource intensive as the data stored in the GridDB is replicated to additional nodes. However, the replicationMode
key enables the control of these features. In other words, we can choose when this replication is done. If we build a high data availability application where fault tolerance is a must, this value should be set to 0
. To explain, by setting the value to 0
, we indicate that we take full advantage of the replication benefits. In other words, we set our value to a full Asynchronous replication
which is the default value of our configuration file. This task is done as follows:
"transaction":{
"replicationMode":0
}
Another alternative to our initial choice is to create an application where performance is our number one priority. In case you are building a transaction-based application, speed is a must, and this can be achieved by setting the key replicationMode
to the value 1
. This puts your database to Semi-synchronous replication
, which saves you resources and pushes you to better performance. This task is done as follows:
"transaction":{
"replicationMode":1
}
Fine-Tuning Use Case
We will load a dataset using Python to test the fine-tuning consequences using a real-world example. Once the dataset is loaded, it will be added to our GridDB database. To test the performance changes, we will test the read operation of our dataset using the SELECT
command. This test will enable us to compare our database pre and post-fine-tuning.
The first step in our journey will be to read the dataset using the read_csv
function. In other words, we will read our dataset as a data frame and then load it to our database. To conduct this task, we will run the following Python command:
df = pd.read_csv("iris.csv")
The second step in our use case is to extract the data types of our columns. This will be extremely helpful in creating our database by defining the data types of our columns. To conduct this task, we will run the dtypes
command:
df.dtypes
The output of the dtypes
method is as follows:
s_length float64
s_width float64
p_length float64
p_width float64
variety object
dtype: object
To successfully load our data frame into the GridDB database, we will have to use the ContainerInfo
method. This function will take the name of our attributes and define their data types. The format of our code will be written as a key value paid. In other words, the key represents the name of our columns, and the value is defined as the data type. This task is achieved with the following Python lines of code:
conInfo = griddb.ContainerInfo("column1",
[["s_length", griddb.Type.FLOAT],
["s_width", griddb.Type.FLOAT],
["p_length", griddb.Type.FLOAT],
["p_width", griddb.Type.FLOAT],
["variety", griddb.Type.STRING]],
griddb.ContainerType.COLLECTION, True)
We will use the loop operation to convert and load our CSV file data into our GridDB database. This will allow us to read every row of our data frame and write its value in our newly created GridDB database. This task is achieved with the following Python lines of code:
filename = 'iris.csv'
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
toGriddb = col.put(row)
col.commit();
To test the performance changes using our fine-tuning configuration options, we will test the speed of the SELECT
operation using Python. The test can be conducted by calculating the time it will take to execute the read operation in our database. This test can be performed by using the %%time
command. The code below is the Python code that was run pre-fine-tuning:
%%time
# Run query before fine-tuning
query = col.query("select * where variety = 'Iris-versicolor'")
The Python code used to run the SELECT
query took 1.3 ms
to execute. The following is the output of our Python code described above:
CPU times: user 1.47 ms, sys: 0 ns, total: 1.47 ms
Wall time: 1.3 ms
To increase the speed of our query, we can set the value of the transaction key replicationMode
to 1. This configures your database to Semi-synchronous replication
, which results in a better performance. This task is done as follows:
"transaction":{
"replicationMode":1
}
To test the speed increase of our GridDB we will run the same SELECT
query and measure the speed of the execution. The code below is the Python code that was run post-fine-tuning:
%%time
# Run query after fine-tuning
query = col.query("select * where variety = 'Iris-versicolor'")
The Python code to run the SELECT
query took 309 µs
to execute. The following is the output of our Python code described above:
CPU times: user 305 µs, sys: 0 ns, total: 305 µs
Wall time: 309 µs
Based on our test, we can see a massive difference in speed and performance when we fine-tune our database depending on our use case application. This opens the door for experimentation and testing different configurations to efficiently develop the best combination that will suit your development framework and provide your users with the best experience.
Conclusion
In this article, we have seen the different ways of configuring the GridDB database. Initially, we configured the working config file and added the fine-tuning parameters. The two areas of focus that we covered are performance and availability. The first area was fine-tuned by testing the different settings of persistencyMode
. Then the replicationMode
parameter was altered to improve our database’s availability and performance parameters. It is crucial to understand that these changes solely depend on your database’s use case. In other words, the application developed and data sorted should determine the configuration settings of your GridDB database.
References
- https://docs.griddb.net/gettingstarted/python/#installation
- https://www.griddb.net/en/docs/GridDB_QuickStartGuide.html#sec-2.4
- https://www.toshiba-sol.co.jp/en/pro/griddb/docs-en/v5/GridDB_SQL_TuningGuide.html
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.