Let’s consider a Cypher query with a, b, the two ends of a complex path :
MATCH (a:Label1)-[a quite complex path]-(b:Label2)
WHERE <a set of complex constraints>
RETURN DISTINCT a, b
At this point, I get (let say) 20.000 results, but for only 2000 distinct ‘a’ and 3000 distinct ‘b’.
I need now to add a specific (heavy) constraint on ‘a’, and another on ‘b’. I can’t add those constraints before computing the path a-[…]-b, because there are two heavy to be computed on the full dataset.
I tried to update my query this way:
MATCH (a:Label1)-[a quite complex path]-(b:Label2)
WHERE <a set of complex constraints>
WITH DISTINCT a, b
WHERE [constraint on a] AND [constraint on b]
RETURN a, b
But now, [constraint on a] and [constraint on b] are both computed 20.000 times, while 2000 (for a) and 3000 (for b) would be enough.
How should I organize this query?
Thanks!
The query below improves on your second query. After generating DISTINCT
a/b
pairs, it uses the COLLECT subquery to generate lists of distinct constrained a
and b
values for final a/b
pair filtering.
MATCH (a:Label1)-[a quite complex path]-(b:Label2)
WHERE <a set of complex constraints>
WITH DISTINCT a, b
WHERE a IN COLLECT{WITH DISTINCT a WHERE [constraint on a] RETURN a} AND
b IN COLLECT{WITH DISTINCT b WHERE [constraint on b] RETURN b}
RETURN a, b
[constraint on a]
would be computed only 2K times, and [constraint on b]
only 3K times.