Table below – order by [low],[high]
Group | Section | Low | High |
---|---|---|---|
A | 01 | 01 | 10 |
A | 01 | 11 | 15 |
A | 01 | 18 | 18 |
A | 01 | 19 | 25 |
A | 02 | 26 | 30 |
A | 02 | 31 | 32 |
A | 01 | 33 | 40 |
A | 01 | 41 | 41 |
By using the table above, I would like to create another table (the example below), so every time before the [Section]
change, I would like the get the MIN(Low)
and MAX(High)
.
Group | Section | Low | High |
---|---|---|---|
A | 01 | 01 | 25 |
A | 02 | 26 | 32 |
A | 01 | 33 | 41 |
Obviously, this is too simple and would not work
select [group], [section], min([low]), max([high])
from table
group by [group], [section]
3
This is called gaps and islands, where you want to group things until a change.
I usually go for this three-stage solution:
;WITH data AS (
SELECT *
FROM
(
VALUES (N'A', N'01', N'01', 10)
, (N'A', N'01', N'11', 15)
, (N'A', N'01', N'18', 18)
, (N'A', N'01', N'19', 25)
, (N'A', N'02', N'26', 30)
, (N'A', N'02', N'31', 32)
, (N'A', N'01', N'33', 40)
, (N'A', N'01', N'41', 41)
) t ([Group],Section,Low,High)
)
SELECT grouping, [group], MIN(Low) AS minLow, MAX(high) AS maxHigh /* 3 */
FROM (
SELECT COUNT(flag) OVER(partition BY [group] ORDER BY low ROWS UNBOUNDED PRECEDING) AS grouping /* 2 */
, *
FROM (
SELECT CASE WHEN LAG(section) OVER(partition BY [group] ORDER BY low) <> section THEN 1 END AS flag /* 1 */
, *
FROM data
) x
) x
GROUP BY [group], grouping
- You create a
flag
which says if previous value is not the same as current, every time this flags turns to 1, it’s a start of a new group. - Count or summarize the flag using a ordered window function. This creates the final grouping. The
rows unbounded preceding
is mostly Sql Server goo which improves performance of the window aggregate - Now, you’re ready to actually perform the aggregation based on our new grouping column, it’s a simple GROUP BY.
The output:
grouping | group | minLow | maxHigh |
---|---|---|---|
0 | A | 01 | 25 |
1 | A | 26 | 32 |
2 | A | 33 | 41 |
PS: watch out for the low/high values, since you use 01
they might not be real integers but varchars, then you have to make sure they’re all padded to same length, because a varchar 99
is more than 100
0
You could also find the different groups by partitioning those with various combinations and finding their difference.
;WITH GroupedData AS (
SELECT
GroupName,
Section,
[Low],
[High],
ROW_NUMBER() OVER (PARTITION BY GroupName ORDER BY [Low])
- ROW_NUMBER() OVER (PARTITION BY GroupName, Section ORDER BY [Low]) AS GroupingKey
FROM DataTable
)
--select * from GroupedData;
SELECT
GroupName,
Section,
MIN([Low]) AS [Low],
MAX([High]) AS [High]
FROM GroupedData
GROUP BY GroupName, Section, GroupingKey
ORDER BY 1, 3;
Result Grid: