I have a table that contains purchase information by account, purchase date, and the sku purchased on that day. The table os a sales order line item table, so that each record represents a separate single unity item purchase (i.e. quantity = 1 for all records).
I want to create a table on the output that will tell me, as of a particular date, what’s the total number of unique skus that a given account has purchased as of that date, starting to count from the accounts first purchase date.
For example, using the data below
create schema adhoc_data.temp;
create table adhoc_data.temp.purchases ( accountid varchar, purchase_date date, productid varchar);
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-01', '534ad451f');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-02', '534ad451f');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-03', '534ad451f');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-04', '534ad451f');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-05', '0f9d321ad');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-06', '0f9d321ad');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-07', '534ad451f');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-08', '4a5d93a1f');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-09', '4a5d93a1f');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-10', '4a5d93a1f');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-10', '534ad451f');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-10', '0f9d321ad');
insert into purchases (accountid, purchase_date , productid) values ('a1', '2022-01-11', '9cd018fc0');
I am trying to calculate
'a1', '2022-01-01', 1
'a1', '2022-01-02', 1
'a1', '2022-01-03', 1
'a1', '2022-01-04', 1
'a1', '2022-01-05', 2
'a1', '2022-01-06', 2
'a1', '2022-01-07', 2
'a1', '2022-01-08', 3
'a1', '2022-01-09', 3
'a1', '2022-01-10', 3
'a1', '2022-01-11', 4
I have tried using an inner join on the purchases table to itself as
select
t1.accountid,
t1.purchasedate,
count(distinct t1.productid)
from
purchases as t1
left join purchases as t2 on t1.accountid = t2.accountid and t1.purchase_date >= t2.purchase_date
group by
t1.accountid,
t1.purchasedate,
order by
t1.accountid,
t2.purchasedate
But that's not working.
Any help would be appreciated.