I need to write an efficient query from my backend application so that I may know how many online sessions are there (count) per day for a given date range.
Sample input:
Year: 2020, month: 08, timezone: “Asia/Dhaka”. <- timezone can be any timezone passed from backend.
Sample output:
[
{"date": "2020-08-01", "online_session_count": 3},
{"date": "2020-08-02", "online_session_count": 0},
{"date": "2020-08-03", "online_session_count": 4},
...... other rows ......
{"date": "2020-08-31", "online_session_count": 1},
]
Here is the table definition:
CREATE TABLE online_speakers (
id INT AUTO_INCREMENT PRIMARY KEY,
speaker_name VARCHAR(255) NOT NULL,
start_time DATETIME NOT NULL,
end_time DATETIME NOT NULL
);
Here are some dummy rows:
INSERT INTO online_speakers (speaker_name, start_time, end_time) VALUES
('Speaker 1', '2020-08-01 10:00:00', '2020-08-01 11:00:00'),
('Speaker 2', '2020-08-01 14:00:00', '2020-08-01 15:00:00'),
('Speaker 3', '2020-08-02 09:00:00', '2020-08-02 10:00:00'),
('Speaker 4', '2020-08-03 10:00:00', '2020-08-03 11:00:00'),
('Speaker 5', '2020-08-03 13:00:00', '2020-08-03 14:00:00'),
('Speaker 6', '2020-08-03 15:00:00', '2020-08-03 16:00:00'),
('Speaker 7', '2020-08-04 11:00:00', '2020-08-04 12:00:00'),
('Speaker 8', '2020-08-04 13:00:00', '2020-08-04 14:00:00'),
('Speaker 9', '2020-08-05 10:00:00', '2020-08-05 11:00:00'),
('Speaker 10', '2020-08-05 15:00:00', '2020-08-05 16:00:00');
And here is the query that I wrote:
WITH RECURSIVE date_range AS (
SELECT DATE('2020-08-01') AS date
UNION ALL
SELECT DATE_ADD(date, INTERVAL 1 DAY)
FROM date_range
WHERE DATE_ADD(date, INTERVAL 1 DAY) <= '2020-08-31'
)
SELECT
date_range.date,
COALESCE(session_tbl.online_session_count, 0) AS online_session_count
FROM
date_range
LEFT JOIN (
SELECT
DATE(CONVERT_TZ(start_time, 'UTC', 'Asia/Dhaka')) AS date,
COUNT(*) AS online_session_count
FROM
online_speakers
WHERE
start_time BETWEEN CONVERT_TZ('2020-08-01 00:00:00', 'Asia/Dhaka', 'UTC')
AND CONVERT_TZ('2020-08-31 23:59:59', 'Asia/Dhaka', 'UTC')
GROUP BY
date
ORDER BY
date
) session_tbl ON session_tbl.date = date_range.date;
As you can see, the where
clause is using aggregating functions, which is very inefficient for large tables containing millions of rows. In such scenarios, how do experts tackle the problem?
Any suggestions/improvement to my existing query is also welcome.
1