Given a table like the following
WITH xs AS (
SELECT 'A' name, 21 value
UNION ALL
SELECT 'B', 24
UNION ALL
SELECT 'C', 28
UNION ALL
SELECT 'D', 47
UNION ALL
SELECT 'E', 48
UNION ALL
SELECT 'F', 49
UNION ALL
SELECT 'G', 50
UNION ALL
SELECT 'H', 58
)
I’d like to keep only the rows that are required to not open up new gaps larger than 10 in the sorted value
column.
For my understanding, I’ve implemented this algorithm in Python
from dataclasses import dataclass
max_gap = 10
@dataclass
class Row:
name: str
value: int
xs = [Row('A', 21), Row('B', 24), Row('C', 28), Row('D', 47), Row('E', 48), Row('F', 49), Row('G', 50), Row('H', 58)]
ys = [xs[0]]
for idx in range(1, len(xs) - 1):
if xs[idx + 1].value > ys[-1].value + max_gap:
ys.append(xs[idx])
ys.append(xs[-1])
assert ys == [Row('A', 21), Row('C', 28), Row('D', 47), Row('G', 50), Row('H', 58)]
but I’m struggling to convert it to BigQuery SQL.
My problem is, that the decision for each row not only depends on the neighboring rows (typical application of WINDOW
with LEAD
and LAG
), but also also on the last value in the output table (ys
).
I’m not sure if recursive CTEs would be a good fit, or if I need to use the procedural language (LOOP
/FOR
), or if there is an elegant solution utilizing “usual” SQL logic to solve this.
It would be awesome if somebody could show me how to do it.