I have a SQL table containing profit for a customer group, customer, project and year. I want to add the sum of the profit of the previous year for the current customer group and the sum of the profit of the current year for the current customer group to each row of the table.
For example, assuming I have this as my input data:
customer_group | customer | project | year | profit |
---|---|---|---|---|
A | A1 | PA11 | 2018 | 2 |
A | A2 | PA21 | 2019 | 47 |
A | A2 | PA21 | 2019 | 12 |
A | A1 | PA11 | 2020 | 70 |
B | B1 | PB11 | 2018 | 0 |
B | B2 | PB21 | 2020 | 5 |
B | B1 | PB12 | 2021 | 23 |
C | C1 | PC11 | 2017 | 1 |
C | C1 | PC12 | 2017 | 4 |
C | C2 | PC21 | 2018 | 10 |
C | C2 | PC22 | 2018 | 6 |
C | C3 | PC33 | 2020 | 11 |
I want the output to be this:
customer_group | customer | project | year | profit | total_profit_current_year | total_profit_last_year |
---|---|---|---|---|---|---|
A | A1 | PA11 | 2018 | 2 | 2 | null |
A | A2 | PA21 | 2019 | 47 | 59 | 2 |
A | A2 | PA21 | 2019 | 12 | 59 | 2 |
A | A1 | PA11 | 2020 | 70 | 70 | 59 |
B | B1 | PB11 | 2018 | 0 | 0 | null |
B | B2 | PB21 | 2020 | 5 | 5 | null |
B | B1 | PB12 | 2021 | 23 | 23 | 5 |
C | C1 | PC11 | 2017 | 1 | 5 | null |
C | C1 | PC12 | 2017 | 4 | 5 | null |
C | C2 | PC21 | 2018 | 10 | 16 | 5 |
C | C2 | PC22 | 2018 | 6 | 16 | 5 |
C | C3 | PC33 | 2020 | 11 | 11 | null |
I first tried to do this with window functions like lag, but the problem is that that does not look at the entire customer group in the previous year, just at the previous row for the previous year customer group, which has just part of the profit.
I also tried using a WITH clause to sum profits over customer groups and years, but left-joining this subquery 2 times gives with the original data gives very bad performance (the real data is pretty large, having > 1M rows)
Is there an efficient solution for this? Ideally I would need something beahving like a lag over a sum, but I’m not sure if this is possible.
Thanks in advance