Introduction
In one of our past posts, we migrated the relational database MySQL using Java. In this post we will continue a similar trend by showing step-by-step how to migrate an existing PostgreSQL database to a GridDB cluster with Python.
Postgresql is a relational database that has been widely used in business intelligence, web development, commerce, healthcare, and biopharmacy. While PostgreSQL offers several benefits, it still suffers from the same disadvantages as many other relational databases. Some of these include poor scalability and a lack of flexibility. PostgreSQL can also have a steep learning curve and have worse performance when handling certain datasets.
GridDB on the other hand will provide the great horizontal scalability and flexibility of a NoSQL database. It also provides ACID compliance along with high reliability and availabilty. GridDB also gives superior performance with an in-memory architecture, making it perfect for many applications, namely IoT applications.
Use Case
To make things simple, we will use the same industry example as our last migration. The scenario will be a photovoltaic site (PV) which can be thought of as an industrial solar farm. Most of the data in this database will relate to information and measurements sent by sensors and the facilities they are contained in.
GridDB Data Model
Since PostgreSQL is a relational database very similar to Oracle Database and MySQL, we will be using the same example database as last time. Therefore our schema for the PostgreSQL database will be the same. Our GridDB data model will also be the same as before, only that we will be creating and populating our database using a Python API.
There will be 4 tables in the PostgreSQL database which will translate into 4 container schemas in GridDB. They are detailed below.
- Facilities: Contains information and specifications for the facilities of the PV site.
- Sensor: Stores general information like sensor-type and name on the PV site’s sensors
- Readings: Stores the timestamps and values of the measurements recorded by the PV site’s sensors
- Alerts: Stores information related to alerts and notifications sent by the sensors
CREATE TABLE IF NOT EXISTS facilities ( facilityId VARCHAR(20) PRIMARY KEY, name VARCHAR(50) NOT NULL, specifications BYTEA );
facilities = gridstore.put_container("facilities", [("facilityId",griddb.GS_TYPE_STRING), ("name",griddb.GS_TYPE_STRING), ("specifications",griddb.GS_TYPE_BLOB)], griddb.GS_CONTAINER_COLLECTION)
CREATE TABLE IF NOT EXISTS sensors ( sensorId VARCHAR(40) PRIMARY KEY, facilityId VARCHAR(20) REFERENCES facilities(facilityId) ON DELETE CASCADE ON UPDATE CASCADE, name VARCHAR(50) NOT NULL, type VARCHAR(50) NOT NULL );
facility = gridstore.put_container("facility_1", [("sensorId",griddb.GS_TYPE_STRING), ("name",griddb.GS_TYPE_STRING), ("type",griddb.GS_TYPE_STRING)], griddb.GS_CONTAINER_COLLECTION)
CREATE TABLE IF NOT EXISTS readings ( recordId SERIAL PRIMARY KEY, sensorId VARCHAR(40) REFERENCES sensors(sensorId) ON DELETE CASCADE ON UPDATE CASCADE, ts TIMESTAMP DEFAULT NOW() NOT NULL, value REAL NOT NULL, status VARCHAR(255) NOT NULL );
timeseries = gridstore.put_container("sensor_1", [("timestamp",griddb.GS_TYPE_TIMESTAMP), ("value",griddb.GS_TYPE_DOUBLE), ("status",griddb.GS_TYPE_STRING)], griddb.GS_CONTAINER_TIME_SERIES)
CREATE TABLE IF NOT EXISTS alerts ( alertId SERIAL PRIMARY KEY, ts TIMESTAMP DEFAULT NOW() NOT NULL, facilityId VARCHAR(20) REFERENCES facilities(facilityId) ON DELETE CASCADE ON UPDATE CASCADE, sensorId VARCHAR(40) REFERENCES sensors(sensorId) ON DELETE CASCADE ON UPDATE CASCADE, level INT NOT NULL, detail VARCHAR(255) NOT NULL );
alerts_col = gridstore.put_container("alerts", [("id",griddb.GS_TYPE_INTEGER), ("timestamp",griddb.GS_TYPE_TIMESTAMP), ("facilityId",griddb.GS_TYPE_STRING), ("sensorId",griddb.GS_TYPE_STRING), ("level",griddb.GS_TYPE_INTEGER), ("detail",griddb.GS_TYPE_STRING)], griddb.GS_CONTAINER_COLLECTION)
Performing the Migration
Now that we have our schemas and data models setup for both PostgreSQL and GridDB, we can now perform our data migration. We will SELECT
all the rows from each PostgreSQL table, reformat and rearrange into a GridDB Row
. To follow, we will then insert that row into its corresponding Collection
or Timeseries
container.
For reference, the PostgreSQL database being migrated is named 'pv'
.
Installing the Python APIs and Connectors
First off, we will need the psycopg2 package to use Python with PostgreSQL. We will also need the griddb_python_client package to access GridDB from Python. You can install both of these packages using the pip
package installer.
$ python -m pip install psycopg2 $ python -m pip install griddb_python_client
Once installed we can connect to our pv database in PostgreSQL using pscycopg2. We will connect to our GridDB cluster with Python.
Migrating the Facilities Table
Once we have our Collection
container formed in GridDB for storing facilities, we can then access and migrate all the rows from the facilities table in PostgreSQL. To begin, we obtain a row cursor
from the facilities table in PostgreSQL. A row cursor
returns an array of one row’s column values.
All the column values are translated to their appropriate types in Python. This means no additional parsing or type-conversion of the values is necessary. All the individual values of the row can be obtained as long as you create the correct amount of variables for the number of columns returned for each row.
Once we have all the row’s column values, we can create a GridDB row-object and set all the row fields
accordingly. Type conversion between the databases is handled this way for types like strings and numerics. The only thing to note is that the BLOB
value from PostgreSQL should be converted to a bytearray
before being inserted into the GridDB collection.
query = "SELECT * FROM facilites" cursor = connection.cursor() #Obtain row cursor from PostgreSQL connection cursor.execute(query) row = cursor.fetchone() while row is not None: ## Get all id, name, and specification column values from row facility_id, name, blob = row facility_row = facilities.create_row() facility_row.set_field_by_string(0,facility_id) facility_row.set_field_by_string(1,name) facility_row.set_field_by_blob(2,bytearray(blob)) ## Convert Blob data to bytearray ret = facilities.put_row(facility_row)
Migrating Sensor Data
During the migration of the facilities table, we obtain the facility_id
of every facility in the database. With that facility id, we can create a sensor_collection
for that facility to store information on its sensors.
First, we must get every row in the sensors
table in PostgreSQL that has the facility id. Since our rows in our Sensor containers in GridDB do not have facility_id columns, there is no need to fetch those columns from PostgreSQL.
Now that we have created our Collection container for a facility’s sensors, we can go through the same process of parsing the PostgreSQL rows and inserting them into the facility’s Collection in GridDB.
facility = "facility_1" query = "SELECT sensorId,name,type FROM sensors WHERE facilityId='%s'" % (facility) cursor = connection.cursor() cursor.execute(query) row = cursor.fetchone() while row is not None: sensor_id, name, sensor_type = row sensor = sensor_collection.create_row() sensor.set_field_by_string(0,sensor_id) sensor.set_field_by_string(1,name) sensor.set_field_by_string(2,sensor_type) ret = sensor_collection.put_row(sensor)
Migrating Sensor Readings
Because we get the sensor_id
of each sensor when migrating the sensors PostgreSQL table, we also create a Timeseries
container for that sensor to store the sensor’s recorded measurements. All that needs to be done is to SELECT
all the rows from the readings table with that sensor-id. Because timeseries containers have timestamps as their row-keys, we will order the rows obtained from PostgreSQL by the TIMESTAMP
column.
Just with all the rows in all the past tables, we go through the same process of retrieving all the column values so they can be set as row-fields in a GridDB row.
One thing to note is that TIMESTAMP
, DATE
, and TIME
column values in PostgreSQL are returned as datetime
values in Python. Timestamp columns, or row-fields, in GridDB, can only be set in a numerical format. As a result, the datetime object must be converted to a timestamp number. Once the conversion is complete, we can then set all the row-fields and insert the row into the Timeseries container for that sensor in GridDB.
griddb = griddb_python_client def datetime_to_timestamp(dt): ## GridDB Timestamp Format: (All fields must be numerics) ## $YEAR-$MONTH-$DAYT$HOUR:$MINUTE:$SECONDZ time_string = dt.strftime("%Y-%m-%dT%XZ") # '%X' means get timestamp value from 'datetime' timestamp = griddb.Timestamp_parse(time_string) return timestamp ## (snip) ## Migrate data from 'readings' table from PostgreSQL into Timeseries containers in GridDB sensor_id = "sensor_1" ## Example of a sensor id in a PV Site's facility query = "SELECT ts,value,status FROM sensors WHERE sensor_id = '%s' ORDER BY ts ASC" % (sensor_id) cursor = connection.cursor() cursor.execute(query) row = cursor.fetchone() ## Obtain column values from each reading row (sensor measurement) from 'readings table while row is not None: date, value, status = row reading = timeseries.create_row() timestamp = datetime_to_timestamp(date) reading.set_field_by_timestamp(0,timestamp) reading.set_field_by_double(1,value) reading.set_field_by_string(2,status) ret = timeseries.put_row(reading)
Migrating the Alerts Table
The alerts
table in PostgreSQL table will only be mapped to 1 collection container in GridDB. As a result, the migration process will be similar to the facilities table.
query = "SELECT * FROM alerts" cursor = connection.cursor() cursor.execute(query) row = cursor.fetchone() while row is not None: alert_id, date, facility_id, sensor_id, level, detail = row alert_notification = alerts_col.create_row() timestamp = datetime_to_timestamp(date) alert_notification.set_field_by_integer(0,alert_id) alert_notification.set_field_by_timestamp(1,timestamp) alert_notification.set_field_by_string(2,facility_id) alert_notification.set_field_by_string(3,sensor_id) alert_notification.set_field_integer(4,level) alert_notification.set_field_by_string(5,detail) ret = alerts_col.put_row(alert_notification)
Conclusion
Now that all 4 PostgreSQL tables are migrated, our migration to GridDB is complete. From there, all the benefits from the superior design and performance of GridDB can be obtained and all the PostgreSQL data can be dropped. With GridDB containers, we get the SQL qualities like ACID compliance but as well as NoSQL benefits like flexibility and high scalability. All it takes is good design and proper data modeling to make the transition from relational databases to GridDB seamless.
Reference
-
PostgreSQL Version 9.2.23 was used as the PostgreSQL database.
-
GridDB Community Edition 3.0.1 was used as a our GridDB database.
-
Python version 2.7.13 was used as the programming language to perform the migration on CentOS 7.3 Linux Operating System.
-
More information on using and accessing GridDB data with Python can be found on our API post.
-
If you want to learn more about migrating relational data to GridDB, read our page that details migrating an Oracle database using the import and export functions of GridDB Standard Edition.
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.
Does gridDB support sql JOINS – similar to PostgreSQL ?
Hello,
At the moment, GridDB does not support SQL-like JOINS. but we are adding SQL support on the next release of GridDB CE coming soon!