I am working on a problem were we have a table that holds id, product code , accountid , date
CREATE TABLE AccountBalanceAndProduct
(
[ID] [int] NOT NULL IDENTITY(1, 1),
[EffectiveDate] [date] NULL,
[AccountID] [varchar] (20) NULL,
[Product] [varchar] (20)
)
INSERT INTO AccountBalanceAndProduct (EffectiveDate, Accountid, Product)
VALUES
('2024-07-10', 'Acc1', 'ABA'),
('2024-07-11', 'Acc1', 'ABA'),
('2024-07-12', 'Acc1', 'ABB'),
('2024-07-12', 'Acc1', 'ABA'),
('2024-07-13', 'Acc1', 'ABB'),
('2024-07-14', 'Acc1', 'ABA'),
('2024-07-15', 'Acc1', 'ABC'),
('2024-07-16', 'Acc1', 'ABC'),
('2024-07-17', 'Acc1', 'ABA'),
('2024-07-10', 'Acc2', 'ABA'),
('2024-07-11', 'Acc2', 'ABA'),
('2024-07-12', 'Acc2', 'ABB'),
('2024-07-13', 'Acc2', 'ABB'),
('2024-07-14', 'Acc2', 'ABA'),
('2024-07-15', 'Acc2', 'ABC'),
('2024-07-16', 'Acc2', 'ABC'),
('2024-07-17', 'Acc2', 'ABA')
What I am trying to work out is when an account changes to a new product with To and From dates of the change along with To and From Product codes
I have wrote the below code but it does not take into account if an account changes from one product to another then back again. I am struggling to find something online to see how this can be done.
Records are added daily to this table with the current product and a account and can hold two at the same time briefly while a manual operations process removes the old product- normally this happens before the new product is added.
;WITH ProductSwitch AS
(
SELECT
a.id,
a.Accountid,
a.Product,
a.EffectiveDate[DateSwitchedTo],
ROW_NUMBER() OVER (PARTITION BY AccountID ORDER BY a.id ASC) [RowNumber]
FROM
(SELECT
MIN(bal.ID) [id],
bal.AccountId,
bal.Product,
MIN(bal.EffectiveDate) [EffectiveDate]
FROM
AccountBalanceAndProduct bal
WHERE
1 = 1
GROUP BY
bal.Product, bal.Accountid) a
)
SELECT
a.AccountID,
a.Product[ProductFrom],
a.DateSwitchedTo[FromDate],
b.Product[ProductTo],
b.DateSwitchedTo[ToDate],
b.RowNumber - 1 [TotalNumberOfSwitchesAtTimeOfSwitch]
FROM
(SELECT
asw.AccountID,
asw.Product,
asw.DateSwitchedTo,
asw.RowNumber
FROM
ProductSwitch asw) a
JOIN
(SELECT
asw.AccountID,
asw.Product,
asw.DateSwitchedTo,
asw.RowNumber
FROM
ProductSwitch asw
WHERE
asw.RowNumber > 1) b ON a.RowNumber = b.RowNumber - 1
AND a.AccountID = b.AccountID
I have created a fiddle https://dbfiddle.uk/8Y7ybsKp
This populates the table with dummy data and shows by current results along with expected result set
I hope that makes sense.
Expected results:
CREATE TABLE ExpectedResults
(
AccountID [varchar](20) NULL,
ProductFrom [varchar](20) NULL,
FromDate [date] NULL,
ProductTo [varchar](20) NULL,
ToDate [date] NULL,
TotalNumberOfSwitchesAtTimeOfSwitch int null
)
INSERT INTO ExpectedResults (AccountID, ProductFrom, FromDate,
ProductTo, ToDate, TotalNumberOfSwitchesAtTimeOfSwitch)
VALUES
('Acc1', 'ABA' ,'2024-07-10' ,'ABB' ,'2024-07-12', 1),
('Acc1', 'ABB' ,'2024-07-12' ,'ABA' ,'2024-07-14', 2),
('Acc1', 'ABA' ,'2024-07-14' ,'ABB' ,'2024-07-16', 3),
('Acc1', 'ABA' ,'2024-07-16' ,'ABC' ,'2024-07-17', 4),
('Acc2', 'ABA' ,'2024-07-10' ,'ABB' ,'2024-07-12', 1),
('Acc2', 'ABB' ,'2024-07-12' ,'ABA' ,'2024-07-14', 2),
('Acc2', 'ABA' ,'2024-07-14' ,'ABB' ,'2024-07-16', 3),
('Acc2', 'ABA' ,'2024-07-16' ,'ABC' ,'2024-07-17', 4)
4
Your source data is ambiguous when you have date-only columns with duplicate dates for the same account ('2024-07-12', 'Acc1', 'ABB')
and ('2024-07-12', 'Acc1', 'ABA')
. This could be interpreted either as ABA -> ABB -> ABA -> ABB or ABA -> ABA -> ABB -> ABB. SQL Server does not have an implicit row order other than that which is explicitly defined.
If we ignore that ambiguity, your question is a variation of a gaps-and-islands problem. In this case, the gaps are defined as changes to the Product
value for a given AccountID
across ascending EffectiveDate
values. These can be identified using the LAG()
fonction to compare product values to identify changes. For most gaps-and-islands problems, we would then group and combine the rows forming each island between the gaps, but in this case, we can just ignore the rows where Product
does not change.
It then becomes a matter of using the LEAD()
function to look ahead at the remaining rows to fill in the ProductTo
and ToDate
values, and a windowed COUNT(*)
to get the desired TotalNumberOfSwitchesAtTimeOfSwitch
values.
The resulting query would be something like:
WITH Changes AS (
SELECT *
FROM (
SELECT
*,
LAG(Product)
OVER(PARTITION BY AccountID ORDER BY EffectiveDate)
AS LagProduct
FROM AccountBalanceAndProduct
) A
WHERE LagProduct IS NULL OR LagProduct <> Product
)
SELECT
AccountID,
Product AS ProductFrom,
EffectiveDate AS FromDate,
LEAD(Product)
OVER(PARTITION BY AccountID ORDER BY EffectiveDate)
AS ProductTo,
LEAD(EffectiveDate)
OVER(PARTITION BY AccountID ORDER BY EffectiveDate)
AS ToDate,
COUNT(*)
OVER(PARTITION BY AccountID ORDER BY EffectiveDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS TotalNumberOfSwitchesAtTimeOfSwitch
FROM Changes
ORDER BY AccountID, EffectiveDate
Results (after excluding the ambiguous 2024-07-12
row):
AccountID | ProductFrom | FromDate | ProductTo | ToDate | TotalNumberOfSwitches AtTimeOfSwitch |
---|---|---|---|---|---|
Acc1 | ABA | 2024-07-10 | ABB | 2024-07-12 | 1 |
Acc1 | ABB | 2024-07-12 | ABA | 2024-07-14 | 2 |
Acc1 | ABA | 2024-07-14 | ABC | 2024-07-15 | 3 |
Acc1 | ABC | 2024-07-15 | ABA | 2024-07-17 | 4 |
Acc1 | ABA | 2024-07-17 | null | null | 5 |
Acc2 | ABA | 2024-07-10 | ABB | 2024-07-12 | 1 |
Acc2 | ABB | 2024-07-12 | ABA | 2024-07-14 | 2 |
Acc2 | ABA | 2024-07-14 | ABC | 2024-07-15 | 3 |
Acc2 | ABC | 2024-07-15 | ABA | 2024-07-17 | 4 |
Acc2 | ABA | 2024-07-17 | null | null | 5 |
Note that the above also includes the final change with starting but no ending values. It also has several other differences from your posted expected results, but I believe the above accurately reflects the input data. Please review your expected results. If the above is not what you are looking for, please expand on your description of your intended logic.
See this db<>fiddle for a demo.
Only a partial answer, but it should point you in the right direction.
What I would do is use LAG()
instead of Row_Number()
to define the window, so that I can filter for rows where the function result does not match the current value:
WITH PriorProducts As
(
SELECT ID, EffectiveDate, AccountID, Product,
LAG(Product) OVER (PARTITION BY AccountID ORDER BY EffectiveDate, ID) ProductFrom,
LAG(EffectiveDate) OVER (PARTITION BY AccountID ORDER BY EffectiveDate, ID) PriorProductEndDate
FROM AccountBalanceAndProduct
)
SELECT ID, AccountID
, Product ProductTo
, EffectiveDate DateStarted
, ProductFrom
, PriorProductEndDate
FROM PriorProducts
WHERE ProductFrom <> Product
AND ProductFrom IS NOT NULL
This isn’t a perfect set of what you want (it doesn’t handle the full dates of set… rather it shows rows for the boundaries). But it should get you there a little easier.
Otherwise, what you’re doing is called gaps and islands, and there’s lots of help online for ways to do this.
This is near to approach, shown by @TN.
I would like to consider in more detail the case when there are two rows on the same date. In this case, let’s try to minimize the number of transitions from product to product. If one of the lines of this day matches the value in the previous day, put this line first in the list. See rows with id 3 and 4.
ID | AccountID | EffectiveDate | Product | rn | rnk | intraNum | newGR | grn | prevProduct | prevEffectiveDate |
---|---|---|---|---|---|---|---|---|---|---|
1 | Acc1 | 2024-07-10 | ABA | 1 | 1 | 0 | 0 | 0 | ABA | 2024-07-10 |
2 | Acc1 | 2024-07-11 | ABA | 2 | 2 | 0 | 0 | 0 | ABA | 2024-07-10 |
4 | Acc1 | 2024-07-12 | ABA | 4 | 3 | 0 | 0 | 0 | ABA | 2024-07-11 |
3 | Acc1 | 2024-07-12 | ABB | 3 | 3 | 1 | 1 | 1 | ABA | 2024-07-12 |
5 | Acc1 | 2024-07-13 | ABB | 5 | 5 | 1 | 0 | 1 | ABB | 2024-07-12 |
For rows (4,Acc1,2024-07-12,ABA) and (3,Acc1,2024-07-12,ABB) we take first row (4,Acc1,2024-07-12,ABA), because previous row is with product ABA
See example
with intraRange as(
select *
,lag(Product,1+rn-rnk,Product)over(partition by AccountId order by EffectiveDate,Id) prevProduct0
,case when lag(Product,1+rn-rnk,Product)
over(partition by AccountId order by EffectiveDate,Id) =Product
then 0
else 1
end intraNum
from (
select ID,AccountID, EffectiveDate, Product
,row_number()over(partition by AccountId order by EffectiveDate,Id)rn
,rank()over(partition by AccountId order by EffectiveDate)rnk
from AccountBalanceAndProduct
)t
)
,tgaps as (
select *
,case when lag(Product,1,Product)
over(partition by AccountId order by EffectiveDate,intraNum) =Product
then 0
else 1
end newGR
from intraRange
)
,tgr as(
select ID, AccountID, EffectiveDate,intraNum,Product
,sum(newgr)over(partition by AccountId order by EffectiveDate,intraNum)+1 grn
,rn,rnk,newGR
from tgaps
)
,tgr_pn as(
select AccountId,grn,min(Product) Product
,min(EffectiveDate) minDate,max(EffectiveDate) maxDate
,min(rn)rnMin,max(rn)rnMax
,lead(min(EffectiveDate),1,max(EffectiveDate))over(partition by AccountId order by grn)nextDate
,lead(min(Product),1,min(Product))over(partition by AccountId order by grn)nextProduct
from tgr
group by AccountId,grn
)
select AccountId,grn
,Product FromProduct, minDate FromDate
,nextProduct ToProduct,nextDate ToDate
nextProduct
from tgr_pn
order by accountId,grn;
Fiddle