I have a table containing one row per supplier per day, like this:
Supplier | Day |
---|---|
Bob | 01/03 |
Bob | 02/03 |
Bob | 03/03 |
And another table which contains all the payments made by each supplier, like this:
Supplier | Payment amount | Day |
---|---|---|
Bob | 15 | 02/03 |
Bob | 21 | 28/03 |
I need to add a column to the first table showing what was the date of the latest payment, for each supplier for each date. The final output will be just like the first table but adding one field called “latest payment date”.
I have tried joining the tables and then filtering to the latest record using a window function, something like this:
SELECT
supplier,
day,
payment
FROM suppliers LEFT JOIN payments
ON suppliers.supplier = payments.supplier
AND supplier.day >= payments.day
QUALIFY ROW_NUMBER() OVER (PARTITION BY supplier ORDER BY day DESC) = 1
This works for a small dataset, but I need to run it on millions of rows, and it hits the BQ resource limit.
Is there any way around it? I have been stuck on this for a while. Any help would be much appreciated.
3
Select supplier, day, max(payment) as last_paid
From suppliers LEFT JOIN payments
ON suppliers.supplier = payments.supplier
AND supplier.day >= Coalesce(payments.day,'0001-01-01')
Group By supplier, day
You need the Coalesce function to make sure the unpaid billing days records are included in the result. The comparison to NULL is always False and would defeat the purpose of the Left Join without the Coalesce.
3
You did not say at all about the size of the tables, the number of payments per day for each supplier. Also, if we talk about performance and resources, you need to know the available indexes on the tables.
Therefore, here is a simple example of a query. It is possible to try and compare different variants.
And also look at the execution plans.
See example
Supplier | Day |
---|---|
Bob | 2024-03-01 |
Bob | 2024-03-02 |
Bob | 2024-03-03 |
Supplier | amount | Day |
---|---|---|
Bob | 15 | 2024-03-02 |
Bob | 21 | 2024-03-28 |
SELECT s.supplier,s.day
,(select max(day) from payments p
where p.supplier=s.supplier and p.day<=s.day) paydate
FROM suppliers s
supplier | day | paydate |
---|---|---|
Bob | 2024-03-01 | null |
Bob | 2024-03-02 | 2024-03-02 |
Bob | 2024-03-03 | 2024-03-02 |
fiddle
1