Composite Row Keys and GridDB

Introduction to Composite Row keys

A composite row consists of two of more attributes that identify a unique entity in a database. Composite row keys were added in GridDB v4.3 and allow rows to be uniquely identified by multiple columns within a collection.

Why are composite row keys useful? Well in many cases, one column of the dataset isn’t unique and you either have to create a column with an incrementing integer or build a string comprised of other column’s values.

TIMESTAMP Location Data
2021-07-15 21:05:58.898 4 76.85
2021-07-15 21:05:58.898 6 63.66
2021-07-15 21:05:59.398 3 56.12
2021-07-15 21:05:59.998 4 27.79

Based on the above schema, no column could be a unique key, but if we combined the TIMESTAMP and location into a composite row key, we can uniquely identify each row. Without a row key, inserting another row with same timestamp or sensor location would append another row to the collection. If timestamp alone was used a row, the second sensor which emit at the same time would overwrite the first one. This is called key collision.

Creating a Container with a Composite Row Key

To create a collection with composite row keys, the Column Info object must be specified before setting the list of columns that will comprise the composite row key:

ContainerInfo containerInfo = new ContainerInfo();
List<columninfo> columnList = new ArrayList</columninfo><columninfo>();
columnList.add(new ColumnInfo("name", GSType.STRING));
columnList.add(new ColumnInfo("phone_number", GSType.STRING));
columnList.add(new ColumnInfo("calls", GSType.INTEGER));
containerInfo.setColumnInfoList(columnList);

containerInfo.setRowKeyColumnList(Arrays.asList(0, 1));

Collection col = store.putCollection(containerName, containerInfo, false);</columninfo>

We can then add rows to the collection:

col.setAutoCommit(false);

Row row = col.createRow();
row.setString(0, "John");
row.setString(1, "123-555-7890");
row.setInteger(2, 10);
col.put(row);
col.commit();

row = col.createRow();
row.setString(0, "John");
row.setString(1, "123-555-4567");
row.setInteger(2, 5);
col.put(row);
col.commit();

row = col.createRow();
row.setString(0, "John");
row.setString(1, "123-555-7890");
row.setInteger(2, 15);
col.put(row);
col.commit();

Querying

You can also query using the Composite Row Key using a Predicate RowKey Map. A Predicate RowKey Map allows you to specify multiple queries by using the Java HashMap interface where the container name is a key and the set of keys as values for that container are the value.

RowKeyPredicate<Row.Key> predicate = RowKeyPredicate.create(containerInfo);
Map<String, RowKeyPredicate<Row.Key>> predMap = new HashMap<String, RowKeyPredicate<Row.Key>>();

Row.Key rowKey = store.createRowKey(containerInfo);
rowKey.setString(0,  "John");
rowKey.setString(1,  "123-555-7890");
predicate.add(rowKey);

predMap.put(containerName, predicate);

Map<String, List<row>> result = store.multiGet(predMap);

Row rrow = result.get(containerName).get(0);
System.out.println(rrow.getString(0)+" "+rrow.getString(1)+" "+rrow.getInteger(2)); 
    </row>

Caveats

The Composite Row Key implementation in GridDB has some limitations, mainly that it is only available in the Java API. Querying a container with Python or other languages will result in an exception. You can query collections with composite row keys with the JDBC connector but inserting or updating rows isn’t possible.

Composite Row Keys aren’t available in Time Series containers but you can use a TIMESTAMP as part of a composite row key.

Potential Uses

In a previous blog, we analyzed the NYC Taxi Commission’s data. In that blog, we ran into row key collisions because there were no unique identifiers in the dataset. Lacking unique identifiers and also the lacking row key feature could cause errors by including duplicate trips if the data was ever imported more than once.

There was another article we did on Time Series Data Forecasting in which we experienced a similar phenomenon. In that tutorial, there was an intermediate container which stored aggregations which had an implicit row key collision which could have been prevented by using composite row keys.

In that dataset, each of the ~125 precincts monthly statistics were stored. So when planning out the schema, we couldn’t just use the precinct as a row key as we wanted to store multiple months of data for one precinct. We also couldn’t use the timestamp as it would have conflict between precincts, so to prevent row key collisions and duplicated data, we set the row key to a string column that was the precinct and timestamp concatenated. This was obviously a very rudimentary and roundabout way of solving an issue that GridDB’s composite row key feature would have easily addressed.

Conclusion

Composite Row Keys are a useful mechanism to prevent conflicts when other alternatives don’t meet an application’s requirements when creating a data design. They present a more elegant solution than managing incremented integers or appending multiple strings together but given GridDB’s flexibility and overall Key-Container data architecture, the author feels they should be avoided unless necessary.

For example, oftentimes the composite row key can be avoided by giving each sensor ID its own container and having the container name be the sensor’s ID. In this case, most IoT use cases can realistically avoid this situation. But sometimes it is unavoidable — and if it is, you can rest easy knowing the composite row exists as an option.

The complete source code in this blog can be found on GridDB.net’s GitHub repository here.