I have a table in BigQuery that represents the edges of a graph. Each node in the graph has a type. Each edge has a created_at property. Now I want to cut the components into subcomponents. So nodes of certain types only appear once in each subcomponent. I want to do this with as little edge cuts as possible. If necessary, prefer to cut younger edges rather than older ones.
- What is an efficient way to do this?
- Can this be done in SQL via recursion?
- What is the type of algorithm that I need to use?
This is an example table structure.
Nodes of type A should only appear once per subcomponent in the resulting subcomponent lookup.
Nodes:
node_id | node_type |
---|---|
1 | A |
2 | B |
3 | A |
4 | C |
5 | A |
6 | B |
Edges
edge_id | source_node | target_node | created_at | source_type | target_type |
---|---|---|---|---|---|
1 | 1 | 4 | 2024-05-01 20:43:00 | A | C |
2 | 2 | 6 | 2024-05-02 06:43:00 | B | B |
3 | 3 | 6 | 2024-05-03 08:43:00 | A | B |
4 | 6 | 4 | 2024-05-04 19:43:00 | B | C |
5 | 4 | 5 | 2024-05-05 21:43:00 | C | A |
reversed edges from above… | |||||
6 | 4 | 1 | 2024-05-01 20:43:00 | C | A |
7 | 6 | 2 | 2024-05-02 06:43:00 | B | B |
8 | 6 | 3 | 2024-05-03 08:43:00 | B | A |
9 | 4 | 6 | 2024-05-04 19:43:00 | C | B |
10 | 5 | 4 | 2024-05-05 21:43:00 | A | C |
This is what I currently do to get the component IDs. Where I’m currently stuck is how to cut these components into subcomponents.
with recursive
connected_components as (
select
source_node as node_id,
source_node as front,
target_node,
source_type,
target_type,
1 as depth,
[source_node] as visited_nodes
from edges
union all
select
cc.node_id,
e.source_node,
e.target_node as front,
e.source_type,
e.target_type,
cc.depth + 1 as depth,
array_concat(cc.visited_nodes, [e.target_node]) as visited_nodes
from edges as e
inner join connected_components as cc on e.source_node = cc.target_node
where cc.depth < 50 and not e.target_node in unnest(cc.visited_nodes)
),
walked as (
select distinct node_id, front, source_node, source_type, target_type
from connected_components
order by 1, 2
),
component as (
select node_id, min(front) as component_id from walked group by 1 order by 1
)
select * from component
As a result, I would want to have a similar lookup table but as a lookup for the subcomponent ID per node.