Introduction and Purpose
The Use Case
Although we often think of the Internet of Things as something that coordinates moving entities. In fact, many companies in the industry deal with stationary devices. One example is a smart metering system that measures power consumption in a building facility and delivers time series data in the granularity that allows developing power-saving strategies.
We will create a business report that compares power consumption in the last seven days split per day of the week against the average consumption per day of the week in the whole previous period of time.
This approach aims to detect anomalies, unveil power consumption patterns, and routinely monitor increases and decreases in power consumption that may result in an unintended high cost.
The article will explain how to build an advanced SQL query that transforms time series data from the GridDB database into an aggregation.
The query will use a few essential SQL time functions. A detailed commentary on their utilization will be provided.
The tutorial will walk you through the process of creating a query with multiple nested subqueries and explain how to combine them without losing an overview of the whole structure.
To use the materials from this article, you need to install an SQL client, for instance, SQLWorkbench/J as described in one of the previous GridDB blog posts here.
The smart meters produce a timestamp twice in a minute. The database has only two columns: datetime in the timestamp format and power in a float format.
Many single entries in the database have very small values in the power column, and there are 2,880 entries being produced each day. It is quite challenging to make any meaningful conclusions without an aggregation.
On the other side, it seems quite intuitive that power consumption varies depending on the day of the week: we do not need much power during the week at our homes, but on the weekend, we use a lot of it. On the contrary, the office building would demonstrate high consumption on the weekdays and low consumption on Saturday and Sunday. That’s why we stick to the day of the week as an important dimension in our report.
Let us first check the raw data, though.
SELECT datetime, power FROM power2 LIMIT 10
Extracting day of the week
GridDB supports the native SQL EXTRACT() function that helps to separate components of the date, including year, month, day, day of week, etc.
EXTRACT() returns a value in the numeric format. Since we will be using day of week (further referred to as DOW) for aggregating the data, we directly convert it into the string format. An aggregation dimension is not allowed in any format that can be aggregated itself.
We simply wrap one function into the other in the query to get the necessary result.
SELECT CAST(EXTRACT(DAY_OF_WEEK, datetime) as STRING) AS DOW, power FROM power2 LIMIT 10
It’s time for the first subquery: SQL won’t let us group by DOW right here after we’ve just created this new column. We “nest” the first query into the new one that we further use for aggregation.
SELECT DOW, SUM(power) as power FROM (SELECT power, CAST(EXTRACT(DAY_OF_WEEK, datetime) as STRING) AS DOW FROM power2 ) AS TEST GROUP BY DOW
Average Consumption per Each Day of Week
We need to keep in mind that we have summed up power consumption for the whole period of time available in the database. We cannot really compare it against only one-week consumption. Nor can we just calculate an average instead of the sum, since this would produce a mean value for the single entries in the table but not an average per day of week.
To solve this problem, we need to count how many times each day of week appears in the data to use it as a divider. We count the number of rows with the SQL native COUNT() function. By doing so, we get the number of timestamps produced on a day, and then we divide it by 2 * 60 * 24 to get the number of days.
SELECT DOW, COUNT(power) / (2 * 60 * 24) AS count_days, SUM(power) as power FROM (SELECT power, CAST(EXTRACT(DAY_OF_WEEK, datetime) as STRING) AS DOW FROM power2 ) AS TEST GROUP BY DOW
We do not need one more output column in the result, so we move the expression counting the unique DOWs and calculate the real average power consumption per day of week straightforwardly in the main query.
SELECT DOW, SUM(power) / (COUNT(power) / (2 * 60 * 24)) as power FROM (SELECT power, CAST(EXTRACT(DAY_OF_WEEK, datetime) as STRING) AS DOW FROM power2 ) AS TEST GROUP BY DOW
Filter the Last Seven Days
Now, when we are set with the first part of our comparison, let’s move to finding out power consumption per day of week for the last seven days. We can use the previous query but we do not need to calculate the number of days: Each DOW appears only once.
Instead, we have to limit the data to the last seven days. How to look back into the past?
We use NOW() as our starting point. NOW() produces the timestamp with the date and time of the moment when the query is performed.
We further utilize the TIMESTAMP_ADD() function that is supported by GridDB. This one transforms a timestamp value. TIMESTAMP_ADD() can change different parts of it: year, month, day, hour, second, and millisecond. Although it looks like it can only add something, it can also do the reverse thing. For this purpose, we specify the argument as a negative number.
SELECT DOW, SUM(power) as power FROM (SELECT power, CAST(EXTRACT(DAY_OF_WEEK, datetime) as STRING) AS DOW FROM power2 WHERE datetime > TIMESTAMP_ADD(DAY, NOW(), -7) AND datetime < = NOW() ) AS TEST GROUP BY DOW</code>
We have added a double WHERE clause to filter the data. Just in case the data have some strange values with the timestamps lying in the future, we sort them out by specifying the upper limit as NOW().
Match the Results by Using the SQL Join
We will use the SQL JOIN command to glue two queries together and match power consumption for both time periods. To differentiate between the origins of the columns in our output result, we will do some renaming. Each column in the first query will get _alltime at the end of it and the columns in the second query will receive _lastweek as a label.
SQL JOIN requires us to give aliases to the subqueries, too, to be able to specify the key: A column that will be used to match values from both subqueries.
SELECT * FROM (SELECT DOW as DOW_alltime, SUM(power) / (COUNT(power) / (2 * 60 * 24)) as power_alltime FROM (SELECT power, CAST(EXTRACT(DAY_OF_WEEK, datetime) as STRING) AS DOW FROM power2 ) AS TEST_alltime GROUP BY DOW_alltime) as ALLTIME JOIN (SELECT DOW as DOW_lastweek, SUM(power) as power_lastweek FROM (SELECT power, CAST(EXTRACT(DAY_OF_WEEK, datetime) as STRING) AS DOW FROM power2 WHERE datetime > TIMESTAMP_ADD(DAY, NOW(), -7) AND datetime < = NOW() ) AS TEST_lastweek GROUP BY DOW_lastweek) as LASTWEEK ON ALLTIME.DOW_alltime = LASTWEEK.DOW_lastweek</code>
We are set to proceed to the final report!
The Final Report
We will do some prettifying to make the final table more readable and possible anomalies more eye-catching:
- get rid of one of the DOW-columns since they are otherwise identical
- rename DOW values to make them look human
- calculate a delta consumption as the difference between the last-week consumption and the overall average consumption.
- add a column with only two values – either “increased” or “decreased” – to make the differences stand out and easy to classify
- last but not least, change the order of the columns, starting with the DOW on the very left, followed by the new semi-grouping column, and then by all numeric columns
Just look at the monster query we’ve got in the end!
SELECT CASE DOW_alltime WHEN '1' THEN 'Monday' WHEN '2' THEN 'Tuesday' WHEN '3' THEN 'Wednesday' WHEN '4' THEN 'Thursday' WHEN '5' THEN 'Friday' WHEN '6' THEN 'Saturday' WHEN '0' THEN 'Sunday' end as day_of_week, case when (power_alltime - power_lastweek) > 0 then 'descreased' when (power_alltime - power_lastweek) < 0 then 'increased' else 'the same' end as status, power_alltime, power_lastweek, power_alltime - power_lastweek as power_usage_delta FROM (SELECT SUM(power) / (COUNT(power) / (2 * 60 * 24)) as power_alltime, DOW as DOW_alltime FROM (SELECT power, CAST(EXTRACT(DAY_OF_WEEK, datetime) as STRING) AS DOW FROM power2 ) AS TEST_alltime GROUP BY DOW_alltime) as ALLTIME JOIN (SELECT SUM(power) as power_lastweek, DOW as DOW_lastweek FROM (SELECT power, CAST(EXTRACT(DAY_OF_WEEK, datetime) as STRING) AS DOW FROM power2 WHERE datetime > TIMESTAMP_ADD(DAY, NOW(), -7) AND datetime < = NOW() ) AS TEST_lastweek GROUP BY DOW_lastweek) as LASTWEEK ON ALLTIME.DOW_alltime = LASTWEEK.DOW_lastweek</code>
This kind of report helps to keep an eye on any undesirable changes in the power consumption patterns.
It allows to pick up anomalies, such as unexpectedly high power consumption on a particular day of week, quickly. The report can be exported and sent to any interested persons to help them recognize the need for action and facilitate decision-making.
Advantages of Using Open-Source Tools for Building Business Reports
Despite that the numerous paid business insights tools are available nowadays, old (like SQL) and new (like GridDB) open-source ones can provide the same quality and even more flexibility for building sophisticated reports.
GridDB supports a wide range of SQL commands with the focus on time functions. GridDB can host vast amounts of data without query performance suffering a millisecond of a delay.
A Short Afterward
We hope that this tutorial also helped you to catch the logic of writing advanced SQL queries and may encourage you to use them on a routine basis.