I have a table that contains two sets of IDS: IDA and IDB, as well as an asofdate filter.
IDA | IDB | asofdate |
---|---|---|
1 | 1 | ‘2024-12-13’ |
2 | 1 | ‘2020-10-20’ |
2 | 2 | ‘2024-12-13’ |
3 | 2 | ‘2024-12-13’ |
4 | 1 | ‘2020-10-20’ |
5 | 3 | ‘2020-10-20’ |
6 | 4 | ‘2024-12-13’ |
7 | 4 | ‘2020-10-20’ |
8 | 5 | ‘2024-12-13’ |
9 | 2 | ‘2020-10-20’ |
10 | 6 | ‘2020-10-20’ |
I want to do the following:
- Create a group for each IDB. If that IDB is mapped to a different IDA and that IDA is mapped to a different IDB, add the new IDB to the group. Repeat for each record.
Example 1 – see The 2nd and 3rd record of the table. The two IDBs of 1 and 2 are mapped to IDA 2. Therefore they would sit under the same group: group 1.
Example 2 – see the third record of the table, because the IDB group of IDBs 1 and 2 has already been created and The IDA 4 is not mapped to any other IDBs the grouping would be the same as the previous example: group 1.
Example 4 – see the records with IDB = 4. Since the IDAs under the IDB 4 are not mapped to any other IDBs anywhere else they are assigned their own group: group 3
Based on the table above there would be groups which should looks like this:
IDA | IDB | asofdate | grouping |
---|---|---|---|
1 | 1 | ‘2024-12-13’ | 1 |
2 | 1 | ‘2020-10-20’ | 1 |
2 | 2 | ‘2024-12-13’ | 1 |
3 | 2 | ‘2024-12-13’ | 1 |
4 | 1 | ‘2021-10-20’ | 1 |
5 | 3 | ‘2024-12-13’ | 2 |
6 | 4 | ‘2024-12-13’ | 3 |
7 | 4 | ‘2024-10-20’ | 3 |
8 | 5 | ‘2024-12-13’ | 4 |
9 | 2 | ‘2020-10-20’ | 1 |
10 | 6 | ‘2020-10-20’ | 5 |
- within each IDB group count the number of distinct IDAs with a current as of date
- associate that count with each IDA from step 1.
The ideal result would look like this with asofdate = ‘2024-12-13’:
IDA | Count_current_IDA |
---|---|
1 | 3 |
2 | 3 |
3 | 3 |
4 | 3 |
5 | 1 |
6 | 1 |
7 | 1 |
8 | 1 |
9 | 3 |
10 | 0 |
So I have been tinkering around with this.
The following query performs the operation for a single “@IDA” variable that can be entered and returns a count for that IDA.
SELECT count(distinct IDA)
FROM mytable
WHERE IDB IN (SELECT IDB FROM mytable WHERE IDA = @IDA) AND asOfDate = @today
I want expand this individual operation such that a query returns a count for each ID-A in mytable
I tried modifying the subquery above and then grouping by IDA, but it is returning incorrect values.
SELECT count(distinct IDA)
FROM mytable t
WHERE IDB IN (SELECT IDB FROM mytable WHERE IDA = t.IDA) AND asOfDate = @today
Group by t.IDA
The difficult part here is that the IDB grouping may contain more than one type of IDB
Ant Par is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2
For this answer, I think that your key issue is how to do the grouping; the other components (filtering by date, etc) are straightforward enough.
As such, this answer is focussed on the grouping issue. However (and unfortunately) I believe this is almost impossible to do with a single straight query.
The issue is that the answers could have any number of recursive links to be grouped. For example, consider the following dataset
IDA IDB AsOfDate
21 2 NULL
22 2 NULL
22 3 NULL
23 3 NULL
23 4 NULL
24 4 NULL
24 5 NULL
25 5 NULL
25 6 NULL
26 6 NULL
26 7 NULL
27 7 NULL
27 8 NULL
28 8 NULL
28 9 NULL
29 9 NULL
Every row is actually part of the same group – it is a big chain (e.g., IDA 21 links to 22 via IDB 2, then 22 links to 23 via IDB 3, … then IDA 29 links to IDA 29 via IDB 9.
As such, you need some sort of recursive way of doing this.
It is most likely possible to do this with a recursive CTE – but I’ve used a WHILE loop below because it’s much easier to understand.
I am going to construct this using a temporary table, including the grouping field. However, for the grouping field, I am going to use the minimum relevant IDB rather than just 1, 2, 3… (in other words, for a given group, the ‘grouping number’ will actually be the minimum IDB for that group).
Compared with your data, grouping 1 would be the same (IDB_Min would be 1), grouping 2 would become IDB_Min 3, grouping 4 would be IDB_Min 5, etc. I think it makes the processing clearer – if you want, you can modify it to simple incrementing group numbers using DENSE_RANK or similar.
Start with the key table
CREATE TABLE #D (IDA int, IDB int, asofdate date, IDB_Min int);
To start with, IDB_Min is set as the current IDB for that row.
Now, the key process is to update this table using two steps
- Step 1: For all rows, if it should have a lower IDB_Min because of IDA, then lower the IDB_Min appropriately
- Step 2: For all rows, if it now should have a lower IDB_Min because of IDB, then lower the IDB_Min appropriately
Step 1 can be accomplished with a relatively simple statement – it finds the minimum of the IDB_Min values (for a given IDA) and assigns that to any rows that don’t have that already.
WITH AGrp AS
(SELECT IDA, MIN(IDB_Min) AS Min_IDB_Min
FROM #D
GROUP BY IDA
)
UPDATE #D
SET IDB_Min = AGrp.Min_IDB_Min
FROM #D
INNER JOIN AGrp ON #D.IDA = AGrp.IDA
WHERE #D.IDB_Min > AGrp.Min_IDB_Min;
Similarly, IDB can be updated in a similar fashion.
WITH BGrp AS
(SELECT IDB, MIN(IDB_Min) AS Min_IDB_Min
FROM #D
GROUP BY IDB
)
UPDATE #D
SET IDB_Min = BGrp.Min_IDB_Min
FROM #D
INNER JOIN BGrp ON #D.IDB = BGrp.IDB
WHERE #D.IDB_Min > BGrp.Min_IDB_Min;
In your dataset, step 1 would modify one row (IDA 2, IDB 2) to set IDB_Min = 1 (from IDA 2, IDB 1).
Then step 2 would modify a further two rows – (3, 2) and (9, 2) – as IDB 2 is now linked to IDB_Min 1.
Because these steps actually did something (changed some rows) we cannot be sure we’re finished – so we need to run the process again – and that is the basis of the loop/recursive process.
For demonstrating the solution, I have created three db<>fiddles
- One showing the full solution (https://dbfiddle.uk/p5g1NDty)
- One showing step 1 and step 2 individiaully (https://dbfiddle.uk/BeL6iKW_)
- One showing the chaining issues using the example data above (https://dbfiddle.uk/wK8roSE9) – with a select statement showing the ‘results’ after each recursion
The final suggestion is below
UPDATE #D
SET IDB_Min = IDB;
DECLARE @n int = 1;
WHILE @n > 0
BEGIN
WITH AGrp AS
(SELECT IDA, MIN(IDB_Min) AS Min_IDB_Min
FROM #D
GROUP BY IDA
)
UPDATE #D
SET IDB_Min = AGrp.Min_IDB_Min
FROM #D
INNER JOIN AGrp ON #D.IDA = AGrp.IDA
WHERE #D.IDB_Min > AGrp.Min_IDB_Min;
SET @n = @@ROWCOUNT;
WITH BGrp AS
(SELECT IDB, MIN(IDB_Min) AS Min_IDB_Min
FROM #D
GROUP BY IDB
)
UPDATE #D
SET IDB_Min = BGrp.Min_IDB_Min
FROM #D
INNER JOIN BGrp ON #D.IDB = BGrp.IDB
WHERE #D.IDB_Min > BGrp.Min_IDB_Min;
SET @n = @n + @@ROWCOUNT;
END;
SELECT * FROM #D;
With the results as follows
IDA IDB asofdate IDB_Min
------- ------ ---------- ----------
1 1 2024-12-13 1
2 1 2020-10-20 1
2 2 2024-12-13 1
3 2 2024-12-13 1
4 1 2020-10-20 1
5 3 2020-10-20 3
6 4 2024-12-13 4
7 4 2020-10-20 4
8 5 2024-12-13 5
9 2 2020-10-20 1
10 6 2020-10-20 6
1
I think for IdA = 5, it has an incorret result in its response
I created a Recursive Function that get the First Occurence of IdA in all dataset
CREATE OR ALTER FUNCTION dbo.GetFirstOccurrenceIdBy(@IdA int, @IdB int)
RETURNS INT
AS
BEGIN
DECLARE @FirstIdAOccurence int,@FirstIdBOccurence int
SELECT TOP 1
@FirstIdAOccurence= IdA
, @FirstIdBOccurence = IdB
FROM JustTable
WHERE @IdA = IdA
OR @IdB = IdB
IF @IdA = @FirstIdAOccurence AND @IdB = @FirstIdBOccurence
RETURN @FirstIdAOccurence
RETURN dbo.GetFirstOccurrenceIdBy(@FirstIdAOccurence,@FirstIdBOccurence)
END
Then, I added a DENSE_RANK function for classification
;WITH TAnalitical
as
(
SELECT
*
,DENSE_RANK() OVER (ORDER BY dbo.GetFirstOccurrenceIdBy(IdA,IdB)) as [grouping]
FROM JustTable
)
SELECT * FROM TAnalitical ORDER BY IdA, IdB
With TAnalitical data, I can also create a Sintetic result
;WITH TAnalitical
as
(
SELECT
*
,DENSE_RANK() OVER (ORDER BY dbo.GetFirstOccurrenceIdBy(IdA,IdB)) as [grouping]
FROM JustTable
),
TSintetic
as(
SELECT
IdA
,IdB
,AsofDate
,(
SELECT count([grouping])
FROM TAnalitical a2
WHERE a1.[grouping] = a2.[grouping]
) AS Count_current_IdA
FROM TAnalitical a1
)
SELECT * FROM TSintetic ORDER BY IdA, IdB
Complete Solution