I’m working with GridDB to store and query time-series data. The data represents sensor readings taken every minute and is stored in a container with the following structure:
CREATE TABLE sensor_readings (
timestamp TIMESTAMP,
sensor_id INT,
reading DOUBLE
);
I need to write a SQL query that calculates a rolling sum of the reading values over a 30-minute window for each sensor. The challenge is that the dataset contains millions of rows, and I want to optimize the query to ensure it runs efficiently on GridDB.
Specifically, I want the following:
- Ensure that the query can be run efficiently even as the dataset grows, ideally using any GridDB-specific features like partitioning, window functions, or aggregation optimizations.
- Calculate the rolling sum of reading for each sensor_id over the last 30 minutes.
Here’s what I’ve tried so far:
SELECT sensor_id, timestamp,
SUM(reading) OVER (PARTITION BY sensor_id
ORDER BY timestamp
RANGE BETWEEN INTERVAL '30 MINUTE' PRECEDING AND CURRENT ROW) AS rolling_sum
FROM sensor_readings;
However, this query is not performing well, especially on larger datasets. I suspect it’s because GridDB is not handling the rolling sum efficiently across the large time series.
Can anyone suggest how to optimize this query, possibly by using GridDB-specific indexing, partitioning strategies, or SQL techniques for better performance? Also, are there any GridDB-specific features that could help me with time-based rolling aggregations?
nadeshara is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.