I have a table where each record represents a transaction by a user. I need to find out if any user has inserted 1000 or more records within a 1 hour time window on a specific date. For example, if a user inserts more than 1000 records between 12:15 PM to 01:15 PM, I want to capture that information.
user_code | timestamp |
---|---|
John | 2024-12-01 01:17:05 |
John | 2024-12-01 01:17:35 |
John | 2024-12-01 01:18:05 |
John | 2024-12-01 01:18:35 |
John | 2024-12-01 01:19:05 |
John | 2024-12-01 01:19:35 |
John | 2024-12-01 01:20:05 |
Alex | 2024-12-01 01:30:15 |
Alex | 2024-12-01 01:30:45 |
Alex | 2024-12-01 01:31:15 |
Alex | 2024-12-01 01:31:45 |
Alex | 2024-12-01 01:32:15 |
Alex | 2024-12-01 01:32:45 |
Alex | 2024-12-01 01:33:15 |
What I’ve Tried:
I’ve tried following a basic approach with the help of ChatGPT, but the query generated, even by AI, is not working as expected. The sliding window calculation seems to be incorrect in its handling, and I’m not able to capture the data as intended.
Approach:
- For each user, calculate a sliding window (1 hour before and 1 hour
after each transaction). - Count the number of records within each window.
- Return only the windows where the count of records is 1000 or more.
I am looking for help to write the correct Oracle SQL query to implement the above logic. If anyone has any suggestions, or a correct query to achieve this, it would be greatly appreciated!
4
Use an analytic function with a range window:
SELECT user_code,
timestamp
FROM (
SELECT user_code,
timestamp,
COUNT(*) OVER (
PARTITION BY user_code
ORDER BY timestamp
RANGE BETWEEN INTERVAL '0' HOUR PRECEDING
AND INTERVAL '1' HOUR FOLLOWING
) AS cnt
FROM table_name
)
WHERE cnt >= 1000
Which, for the sample data:
CREATE TABLE table_name (user_code, timestamp) AS
SELECT 'Alice',
TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL - 1)
FROM DUAL
CONNECT BY LEVEL <= 1003
UNION ALL
SELECT 'Beryl',
TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL - 1)
FROM DUAL
CONNECT BY LEVEL <= 500
UNION ALL
SELECT 'Beryl',
TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL + 501)
FROM DUAL
CONNECT BY LEVEL <= 500
UNION ALL
SELECT 'Carol',
TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '3' SECOND * (LEVEL - 1)
FROM DUAL
CONNECT BY LEVEL <= 900
Outputs the start of the hour ranges which contain 1000 or more rows:
USER_CODE | TIMESTAMP |
---|---|
Alice | 2024-01-01 00:00:00.000000000 |
Alice | 2024-01-01 00:00:03.000000000 |
Alice | 2024-01-01 00:00:06.000000000 |
Alice | 2024-01-01 00:00:09.000000000 |
Beryl | 2024-01-01 00:00:00.000000000 |
If you want to return the 1-hour ranges then, from Oracle 12, you can use MATCH_RECOGNIZE
:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY user_code
ORDER BY timestamp
MEASURES
FIRST(timestamp) AS timestamp_start,
LAST(timestamp) AS timestamp_end,
COUNT(timestamp) AS cnt
PATTERN (start_row within_1_hour+)
DEFINE
within_1_hour AS timestamp <= FIRST(timestamp) + INTERVAL '1' HOUR
)
WHERE cnt >= 1000
Which, for the sample data, outputs:
USER_CODE | TIMESTAMP_START | TIMESTAMP_END | CNT |
---|---|---|---|
Alice | 2024-01-01 00:00:00.000000000 | 2024-01-01 00:50:06.000000000 | 1003 |
Beryl | 2024-01-01 00:00:00.000000000 | 2024-01-01 00:50:03.000000000 | 1000 |
fiddle
3
Maybe this could help you:
-- S a m p l e D a t a :
CREATE TABLE tbl (user_code, timestamp) AS
SELECT 'Alice', TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '20' MINUTE * (LEVEL - 1)
FROM DUAL CONNECT BY LEVEL <= 9
UNION ALL
SELECT 'Beryl', TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '20' MINUTE * (LEVEL - 1)
FROM DUAL CONNECT BY LEVEL <= 4
UNION ALL
SELECT 'Beryl', TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '20' MINUTE * (LEVEL + 1)
FROM DUAL CONNECT BY LEVEL <= 3
UNION ALL
SELECT 'Carol', TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '20' MINUTE * (LEVEL - 1)
FROM DUAL CONNECT BY LEVEL <= 2
I need to find out if any user has inserted 1000 or more records within a 1 hour time window on a specific date
Create a CTE (grid) to get a date and timeframes of 1 hour before and 1 hour after the current row. Then count previous and following rows per user and date …
WITH
grid AS
( SELECT user_code, timestamp, a_date, prev_hour, next_hour,
Count(*) Over(Partition By user_code, a_date
Order By timestamp
Rows Between Unbounded Preceding And Current Row) as cnt_prev,
Count(*) Over(Partition By user_code, a_date
Order By timestamp
Rows Between Current Row And Unbounded Following) as cnt_next
FROM ( Select user_code, timestamp, Trunc(timestamp) as a_date,
timestamp - INTERVAL '1' HOUR as prev_hour,
timestamp + INTERVAL '1' HOUR as next_hour
From tbl
)
WHERE timestamp Between prev_hour And timestamp
OR
timestamp Between timestamp And next_hour
)
… now get Max counts bigger than some value (here it is 8 – you put 1000) and group by columns that you want to (here it is user_code, a_date and a_time – that could be just the user_code if you want)
-- M a i n S Q L :
Select user_code, a_date, To_Char(timestamp, 'hh24:mi:ss') as a_time, Max(cnt_prev) as cnt_prev, Max(cnt_next) as cnt_next
From grid
Group By user_code, a_date, To_Char(timestamp, 'hh24:mi:ss')
HAVING Max(cnt_prev) > 8 OR Max(cnt_next) > 8
Order By user_code, a_date, To_Char(timestamp, 'hh24:mi:ss')
R e s u l t :
USER_CODE | A_DATE | A_TIME | CNT_PREV | CNT_NEXT |
---|---|---|---|---|
Alice | 01-JAN-24 | 00:00:00 | 1 | 9 |
Alice | 01-JAN-24 | 02:40:00 | 9 | 1 |
fiddle