Migrating from PostgreSQL to GridDB

Introduction

In one of our past posts, we migrated the relational database MySQL using Java. In this post we will continue a similar trend by showing step-by-step how to migrate an existing PostgreSQL database to a GridDB cluster with Python.
Postgresql is a relational database that has been widely used in business intelligence, web development, commerce, healthcare, and biopharmacy. While PostgreSQL offers several benefits, it still suffers from the same disadvantages as many other relational databases. Some of these include poor scalability and a lack of flexibility. PostgreSQL can also have a steep learning curve and have worse performance when handling certain datasets.
GridDB on the other hand will provide the great horizontal scalability and flexibility of a NoSQL database. It also provides ACID compliance along with high reliability and availabilty. GridDB also gives superior performance with an in-memory architecture, making it perfect for many applications, namely IoT applications.

Use Case

To make things simple, we will use the same industry example as our last migration. The scenario will be a photovoltaic site (PV) which can be thought of as an industrial solar farm. Most of the data in this database will relate to information and measurements sent by sensors and the facilities they are contained in.

GridDB Data Model

Since PostgreSQL is a relational database very similar to Oracle Database and MySQL, we will be using the same example database as last time. Therefore our schema for the PostgreSQL database will be the same. Our GridDB data model will also be the same as before, only that we will be creating and populating our database using a Python API.

There will be 4 tables in the PostgreSQL database which will translate into 4 container schemas in GridDB. They are detailed below.

  1. Facilities: Contains information and specifications for the facilities of the PV site.
  2. PostgreSQL Facilities Table

    CREATE TABLE IF NOT EXISTS facilities (
    	facilityId VARCHAR(20) PRIMARY KEY,
    	name VARCHAR(50) NOT NULL,
    	specifications BYTEA
    );
               

    GridDB Facility Container Schema

    facilities = gridstore.put_container("facilities",
                            [("facilityId",griddb.GS_TYPE_STRING),
    			("name",griddb.GS_TYPE_STRING),
                            ("specifications",griddb.GS_TYPE_BLOB)],
    			griddb.GS_CONTAINER_COLLECTION)
               

     

  3. Sensor: Stores general information like sensor-type and name on the PV site’s sensors
  4. PostgreSQL Sensor Table

    CREATE TABLE IF NOT EXISTS sensors (
    	sensorId VARCHAR(40) PRIMARY KEY,
    	facilityId VARCHAR(20) REFERENCES facilities(facilityId)
               ON DELETE CASCADE ON UPDATE CASCADE,
    	name VARCHAR(50) NOT NULL,
    	type VARCHAR(50) NOT NULL
    );
              

    GridDB Sensor Container Schema

    facility = gridstore.put_container("facility_1",
                    [("sensorId",griddb.GS_TYPE_STRING),
    		("name",griddb.GS_TYPE_STRING),
                    ("type",griddb.GS_TYPE_STRING)],
                     griddb.GS_CONTAINER_COLLECTION)
               

     

  5. Readings: Stores the timestamps and values of the measurements recorded by the PV site’s sensors
  6. PostgreSQL Reading Table

    CREATE TABLE IF NOT EXISTS readings (
    	recordId SERIAL PRIMARY KEY,
    	sensorId VARCHAR(40) REFERENCES sensors(sensorId)
                ON DELETE CASCADE ON UPDATE CASCADE,
    	ts TIMESTAMP DEFAULT NOW() NOT NULL,
    	value REAL NOT NULL,
    	status VARCHAR(255) NOT NULL
    );
                

    GridDB Reading Container Schema

    timeseries = gridstore.put_container("sensor_1",
                    [("timestamp",griddb.GS_TYPE_TIMESTAMP),
    		("value",griddb.GS_TYPE_DOUBLE),
                    ("status",griddb.GS_TYPE_STRING)],
    		griddb.GS_CONTAINER_TIME_SERIES)
                

     

  7. Alerts: Stores information related to alerts and notifications sent by the sensors
  8. PostgreSQL Alerts Table

    CREATE TABLE IF NOT EXISTS alerts (
    	alertId SERIAL PRIMARY KEY,
    	ts TIMESTAMP DEFAULT NOW() NOT NULL,
    	facilityId VARCHAR(20) REFERENCES facilities(facilityId)
                ON DELETE CASCADE ON UPDATE CASCADE,
    	sensorId VARCHAR(40) REFERENCES sensors(sensorId)
                ON DELETE CASCADE ON UPDATE CASCADE,
    	level INT NOT NULL,
    	detail VARCHAR(255) NOT NULL
    );
                

    GridDB Alerts Container Schema

    alerts_col = gridstore.put_container("alerts",
                    [("id",griddb.GS_TYPE_INTEGER),
    		("timestamp",griddb.GS_TYPE_TIMESTAMP),
                    ("facilityId",griddb.GS_TYPE_STRING),
    		("sensorId",griddb.GS_TYPE_STRING),
                    ("level",griddb.GS_TYPE_INTEGER),
                    ("detail",griddb.GS_TYPE_STRING)],
    		griddb.GS_CONTAINER_COLLECTION)
                

     

Performing the Migration

Now that we have our schemas and data models setup for both PostgreSQL and GridDB, we can now perform our data migration. We will SELECT all the rows from each PostgreSQL table, reformat and rearrange into a GridDB Row. To follow, we will then insert that row into its corresponding Collection or Timeseries container.
For reference, the PostgreSQL database being migrated is named 'pv'.

Installing the Python APIs and Connectors

First off, we will need the psycopg2 package to use Python with PostgreSQL. We will also need the griddb_python_client package to access GridDB from Python. You can install both of these packages using the pip package installer.

$ python -m pip install psycopg2
$ python -m pip install griddb_python_client

Once installed we can connect to our pv database in PostgreSQL using pscycopg2. We will connect to our GridDB cluster with Python.

Migrating the Facilities Table

Once we have our Collection container formed in GridDB for storing facilities, we can then access and migrate all the rows from the facilities table in PostgreSQL. To begin, we obtain a row cursor from the facilities table in PostgreSQL. A row cursor returns an array of one row’s column values.
All the column values are translated to their appropriate types in Python. This means no additional parsing or type-conversion of the values is necessary. All the individual values of the row can be obtained as long as you create the correct amount of variables for the number of columns returned for each row.
Once we have all the row’s column values, we can create a GridDB row-object and set all the row fields accordingly. Type conversion between the databases is handled this way for types like strings and numerics. The only thing to note is that the BLOB value from PostgreSQL should be converted to a bytearray before being inserted into the GridDB collection.

query = "SELECT * FROM facilites"
cursor = connection.cursor() #Obtain row cursor from PostgreSQL connection
cursor.execute(query)
row = cursor.fetchone()
while row is not None:
        ## Get all id, name, and specification column values from row
        facility_id, name, blob = row
        facility_row = facilities.create_row()
        facility_row.set_field_by_string(0,facility_id)
        facility_row.set_field_by_string(1,name)
        facility_row.set_field_by_blob(2,bytearray(blob)) ## Convert Blob data to bytearray
        ret = facilities.put_row(facility_row)

Migrating Sensor Data

During the migration of the facilities table, we obtain the facility_id of every facility in the database. With that facility id, we can create a sensor_collection for that facility to store information on its sensors.
First, we must get every row in the sensors table in PostgreSQL that has the facility id. Since our rows in our Sensor containers in GridDB do not have facility_id columns, there is no need to fetch those columns from PostgreSQL.
Now that we have created our Collection container for a facility’s sensors, we can go through the same process of parsing the PostgreSQL rows and inserting them into the facility’s Collection in GridDB.

facility = "facility_1"
query = "SELECT sensorId,name,type FROM sensors WHERE facilityId='%s'" % (facility)
cursor = connection.cursor()
cursor.execute(query)
row = cursor.fetchone()
while row is not None:
        sensor_id, name, sensor_type = row
        sensor = sensor_collection.create_row()
        sensor.set_field_by_string(0,sensor_id)
        sensor.set_field_by_string(1,name)
        sensor.set_field_by_string(2,sensor_type)
        ret = sensor_collection.put_row(sensor)

Migrating Sensor Readings

Because we get the sensor_id of each sensor when migrating the sensors PostgreSQL table, we also create a Timeseries container for that sensor to store the sensor’s recorded measurements. All that needs to be done is to SELECT all the rows from the readings table with that sensor-id. Because timeseries containers have timestamps as their row-keys, we will order the rows obtained from PostgreSQL by the TIMESTAMP column.
Just with all the rows in all the past tables, we go through the same process of retrieving all the column values so they can be set as row-fields in a GridDB row.
One thing to note is that TIMESTAMP, DATE, and TIME column values in PostgreSQL are returned as datetime values in Python. Timestamp columns, or row-fields, in GridDB, can only be set in a numerical format. As a result, the datetime object must be converted to a timestamp number. Once the conversion is complete, we can then set all the row-fields and insert the row into the Timeseries container for that sensor in GridDB.

griddb = griddb_python_client
def datetime_to_timestamp(dt):
        ## GridDB Timestamp Format: (All fields must be numerics)
        ## $YEAR-$MONTH-$DAYT$HOUR:$MINUTE:$SECONDZ
        time_string = dt.strftime("%Y-%m-%dT%XZ") # '%X' means get timestamp value from 'datetime'
        timestamp = griddb.Timestamp_parse(time_string)
        return timestamp
## (snip)
## Migrate data from 'readings' table from PostgreSQL into Timeseries containers in GridDB
sensor_id = "sensor_1" ## Example of a sensor id in a PV Site's facility
query = "SELECT ts,value,status FROM sensors WHERE sensor_id = '%s' ORDER BY ts ASC" % (sensor_id)
cursor = connection.cursor()
cursor.execute(query)
row = cursor.fetchone()
## Obtain column values from each reading row (sensor measurement) from 'readings table
while row is not None:
        date, value, status = row
        reading = timeseries.create_row()
        timestamp = datetime_to_timestamp(date)
        reading.set_field_by_timestamp(0,timestamp)
        reading.set_field_by_double(1,value)
        reading.set_field_by_string(2,status)
        ret = timeseries.put_row(reading)

Migrating the Alerts Table

The alerts table in PostgreSQL table will only be mapped to 1 collection container in GridDB. As a result, the migration process will be similar to the facilities table.

query = "SELECT * FROM alerts"
cursor = connection.cursor()
cursor.execute(query)
row = cursor.fetchone()
while row is not None:
        alert_id, date, facility_id, sensor_id, level, detail = row
        alert_notification = alerts_col.create_row()
        timestamp = datetime_to_timestamp(date)
        alert_notification.set_field_by_integer(0,alert_id)
        alert_notification.set_field_by_timestamp(1,timestamp)
        alert_notification.set_field_by_string(2,facility_id)
        alert_notification.set_field_by_string(3,sensor_id)
        alert_notification.set_field_integer(4,level)
        alert_notification.set_field_by_string(5,detail)
        ret = alerts_col.put_row(alert_notification)

Conclusion

Now that all 4 PostgreSQL tables are migrated, our migration to GridDB is complete. From there, all the benefits from the superior design and performance of GridDB can be obtained and all the PostgreSQL data can be dropped. With GridDB containers, we get the SQL qualities like ACID compliance but as well as NoSQL benefits like flexibility and high scalability. All it takes is good design and proper data modeling to make the transition from relational databases to GridDB seamless.

Reference

  • PostgreSQL Version 9.2.23 was used as the PostgreSQL database.

  • GridDB Community Edition 3.0.1 was used as a our GridDB database.

  • Python version 2.7.13 was used as the programming language to perform the migration on CentOS 7.3 Linux Operating System.

  • More information on using and accessing GridDB data with Python can be found on our API post.

  • If you want to learn more about migrating relational data to GridDB, read our page that details migrating an Oracle database using the import and export functions of GridDB Standard Edition.

2 Comments

    1. Israel

      Hello,

      At the moment, GridDB does not support SQL-like JOINS. but we are adding SQL support on the next release of GridDB CE coming soon!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.