Recently we switched to timescaledb for our timeseries data. Basic quries seem to work fine but we have an engine simulator that needs to fetch data from multiple tables, aggregate them and sort them before streaming. However, the below code takes more than a minute which is a problem since we need to stream at least a year’s worth of data. Below is my query:
<code>WITH f_md AS (
SELECT
raw_message,
datetime
FROM my_f_data
WHERE
id=ANY(ARRAY['V03', 'V06'])
AND
datetime BETWEEN
TIMESTAMP '2024-03-20 06:50:00' AND
TIMESTAMP '2024-03-20 13:00:00'
), s_md AS (
SELECT
raw_message,
datetime
FROM my_s_data
WHERE
id=ANY(ARRAY['S107', 'S057'])
AND
datetime BETWEEN
TIMESTAMP '2024-03-20 10:50:00' AND
TIMESTAMP '2024-03-20 20:00:00'
), all_data AS (
SELECT * FROM f_md
UNION ALL
SELECT * FROM s_md
)
SELECT * FROM all_data
ORDER BY datetime
</code>
<code>WITH f_md AS (
SELECT
raw_message,
datetime
FROM my_f_data
WHERE
id=ANY(ARRAY['V03', 'V06'])
AND
datetime BETWEEN
TIMESTAMP '2024-03-20 06:50:00' AND
TIMESTAMP '2024-03-20 13:00:00'
), s_md AS (
SELECT
raw_message,
datetime
FROM my_s_data
WHERE
id=ANY(ARRAY['S107', 'S057'])
AND
datetime BETWEEN
TIMESTAMP '2024-03-20 10:50:00' AND
TIMESTAMP '2024-03-20 20:00:00'
), all_data AS (
SELECT * FROM f_md
UNION ALL
SELECT * FROM s_md
)
SELECT * FROM all_data
ORDER BY datetime
</code>
WITH f_md AS (
SELECT
raw_message,
datetime
FROM my_f_data
WHERE
id=ANY(ARRAY['V03', 'V06'])
AND
datetime BETWEEN
TIMESTAMP '2024-03-20 06:50:00' AND
TIMESTAMP '2024-03-20 13:00:00'
), s_md AS (
SELECT
raw_message,
datetime
FROM my_s_data
WHERE
id=ANY(ARRAY['S107', 'S057'])
AND
datetime BETWEEN
TIMESTAMP '2024-03-20 10:50:00' AND
TIMESTAMP '2024-03-20 20:00:00'
), all_data AS (
SELECT * FROM f_md
UNION ALL
SELECT * FROM s_md
)
SELECT * FROM all_data
ORDER BY datetime
The tables are hypertables and timestamp is in microsecond granularity. No primary keys and datetime is the sorting key.
Any suggestion on how to improve this would be much appreciated.