I am working with forecourt machine data (e.g. jet wash / air) that accepts coins to operate. Each coin is a transaction line. However, I want to group up multiple rows into one ‘transaction’ if the coins went in less than 3 minutes apart.
I thought to use LAG function to tell me if the previous transaction is less than 3 minutes ago, but now I’m not sure how to group up the multiple rows (as it could be 1 previous row or many previous rows).
How would you tackle this problem?
Here’s what I tried so far:
select t1.*,
datediff(second, PreviousVendDateTime, VendDateTime) as SecondsDiff
case when datediff(second, PreviousVendDateTime, VendDateTime) < 360 then 1 else 0 end as SameTransaction
from
(select mct.TransactionID,
mct.MachineID,
mct.VendDateTime,
mct.Amount,
lag(mct.VendDateTime, 1)
over(partition by mct.MachineID order by mct.VendDateTime as PreviousVendDateTime
from Machines as m
join MCI_CashTrans as mct
on mct.MachineID = m.MachineID
and mct.MachineSerial = m.MachineSerial
join Account as a
on m.AccountNo = a.AccountNo
join MachineType as mt
on m.MachineTypeID = mt.MachineTypeID) as t1
order by MachineID, VendDateTime
And a sample of the results:
TransactionID | MachineID | VendDateTime | Amount | PreviousVendDateTime | SecondsDiff | SameTransaction |
---|---|---|---|---|---|---|
2518059219 | 777 | 2023-06-01 19:01:54 | 1.00 | 2023-06-01 18:53:58 | 476 | 0 |
2519059559 | 777 | 2023-06-01 19:02:09 | 1.00 | 2023-06-01 19:01:54 | 15 | 1 |
2518357022 | 777 | 2023-06-01 21:20:41 | 1.00 | 2023-06-01 19:02:09 | 8312 | 0 |
2518362875 | 777 | 2023-06-01 21:23:01 | 1.00 | 2023-06-01 21:20:41 | 140 | 1 |
2518369251 | 777 | 2023-06-01 21:29:12 | 1.00 | 2023-06-01 21:23:01 | 371 | 0 |
2518369599 | 777 | 2023-06-01 21:29:38 | 1.00 | 2023-06-01 21:29:12 | 26 | 1 |
2518369737 | 777 | 2023-06-01 21:29:51 | 1.00 | 2023-06-01 21:29:38 | 13 | 1 |
2518369894 | 777 | 2023-06-01 21:30:04 | 0.50 | 2023-06-01 21:29:51 | 13 | 1 |
2518370027 | 777 | 2023-06-01 21:30:17 | 0.50 | 2023-06-01 21:30:04 | 13 | 1 |
2518370171 | 777 | 2023-06-01 21:30:31 | 1.00 | 2023-06-01 21:30:17 | 14 | 1 |
2518370338 | 777 | 2023-06-01 21:30:44 | 1.00 | 2023-06-01 21:30:31 | 13 | 1 |
2518404884 | 777 | 2023-06-01 21:46:11 | 1.00 | 2023-06-01 21:30:44 | 927 | 0 |
4
Revert the case
to get a start of transaction flag and to compute a tranNo as a sum of flags
with starts as(
select t1.*,
datediff(second, PreviousVendDateTime, VendDateTime) as SecondsDiff
case when datediff(second, PreviousVendDateTime, VendDateTime) >= 360 then 1 else 0 end as tranStart
from
(select mct.TransactionID,
mct.MachineID,
mct.VendDateTime,
mct.Amount,
lag(mct.VendDateTime, 1)
over(partition by mct.MachineID order by mct.VendDateTime as PreviousVendDateTime
from Machines as m
join MCI_CashTrans as mct
on mct.MachineID = m.MachineID
and mct.MachineSerial = m.MachineSerial
join Account as a
on m.AccountNo = a.AccountNo
join MachineType as mt
on m.MachineTypeID = mt.MachineTypeID) as t1
)
select starts.*,
sum(tranStart) over(partition by MachineID order by VendDateTime) trnNo
Then you can group rows by MachineID
and trnNo
I’d give this a try;
In a single statement, use ‘with’ and lag / lead window functions to produce
-
a cte A with machineId, timeStamp, isStart partitioned by machineId, sorted by timestamp where isStart 1 indicates is more than 3 mins ahead of previous in partition, otherwise = 0
-
a cte B = subset of A where isStart = 1 partitioned by machineId, sorted by timestamp with nextStart = value for timestamp of subsequent row in partition.
-
a join between your main data D and B on D.machineId = B.machineID and D.timestamp >= B.timestamp and
D.timestamp < B.nextStart, grouped by machineId, B.timestamp.
If there’s a lot of data I might use temp tables and indexing or even use an alternative to SQL.