I’m trying to do some log analysis and I’m trying to get a breakdown by the various levels of a domain name… i.e. for a domain name with 4 codons (a.b.c.com for example), I want to be able to roll them up by .com, c.com, b.c.com and a.b.c.com.
I’ve been tinkering with string_split and a recursive cte for a while but the thing that appears to be tripping me up is the recursive portion and how the window functions work (or don’t) in that part.
; WITH parts as ( -- break up the domain name and put a number on each piece for windowing functions
select top 100000 value as dom, ROW_NUMBER() OVER (order BY (SELECT NULL)) id
from string_split('www.google.com', '.')
),
agg as (
-- try to put the pieces back together
SELECT cast(dom as varchar(max)) thisLevel, cast(LEAD(dom) OVER (order by id desc) as varchar(max)) nextLevel, cast(LEAD(dom) OVER (order by id desc) + '.' + dom as varchar(max)) toplevel, id, MAX(ID) OVER ( order by id desc) maxID
FROM parts
where id > 0
UNION ALL
-- The OVER ORDER BY seems to be ignored on the recursion
SELECT thisLevel, nextLevel, cast(nextLevel + '.' + LEAD(topLevel) OVER (ORDER BY Id desc) as varchar(max)) as topLevel, id, maxid
from agg
where nextLevel is not null and toplevel is not null
)
select * from agg
From the anchor pass, I’m getting
com, google, google.com, 3, 3
google, www, www.google, 2, 3
www, NULL, NULL, 1, 3
as I would expect, but the LEAD() OVER (order by id desc) looks like it is ignored, and the recursion starts feeding the anchor batch in reverse order, which is puzzling me.
In the second pass it gets
google, www, null, 2, 3
com, google, null, 3, 3
when order by id desc seems like it should be going {3,2} instead of {2,3}
Am I missing something here?
1