I’m working on a SQL solution to squeeze the number of rows when there are matching “start” and “close” values across records. Here is an example input and the required output
Input:
+----+-----+-----+
|name|start|close|
+----+-----+-----+
|A |120 |130 |
|A |130 |140 |
|A |140 |150 |
|A |152 |160 |
|A |160 |180 |
|B |100 |130 |
|B |130 |200 |
|B |202 |250 |
|C |300 |400 |
+----+-----+-----+
Required Output:
+----+-----+-----+
|name|start|close|
+----+-----+-----+
|A |120 |150 |
|A |152 |180 |
|B |100 |200 |
|B |202 |250 |
|C |300 |400 |
+----+-----+-----+
I tried with lag() function, but not getting correct output.. the null boundaries are crossing.
with
t1 (
select name, start, close, lag(close) over(partition by name order by start) pclose from meeting
),
t2 (
select * from t1 where 1 = (case when pclose is null then 1
when start = pclose then 0 else 1 end)
)
select * from t2 order by name, start
Any ANSI complaint solution is welcome as I can port it easily to spark.