Can someone please describe the chaining logic in this Cypher code:
This code is provided in the graph academy lessons, and it converts the language property of node Movie
into a separate node.
MATCH (m:Movie)
UNWIND m.languages AS language
WITH language, collect(m) AS movies
MERGE (l:Language {name:language})
WITH l, movies
UNWIND movies AS m
WITH l,m
MERGE (m)-[:IN_LANGUAGE]->(l);
MATCH (m:Movie)
SET m.languages = null
Particularly, this line is very mysterious to me: MERGE (m)-[:IN_LANGUAGE]->(l);
how does Cypher remembers that mapping of which movie mapping to which language, because we’re doing a lot of with there and unwinding till we get to WITH l,m
where we have l
as the language added nodes and m
as the movies list unwind.
Can you either help with some viz or direct me to understand the inner working of this, because it’s not intuitive at all.
I tried to read about the WITH
clause, and I understand that it does the GROUP BY
equivalent of SQL, but still far away from the clear picture of the inner process in Cypher.
An illustration may be helpful.
If we run this query:
MATCH (m:Movie)
UNWIND m.languages AS language
WITH language, collect(m) AS movies
MERGE (l:Language {name:language})
WITH l, movies
UNWIND movies AS m
WITH l, m
MERGE (m)-[r:IN_LANGUAGE]->(l);
And the DB has these 3 movies:
(m1:Movie {id: 'm1', languages: ['en']})
(m2:Movie {id: 'm2', languages: ['es', 'fr', 'en']})
(m3:Movie {id: 'm3', languages: ['fr', 'es']})
Here are the data rows resulting from each clause of the query (with variable names in each heading). Unquoted items in a row are either nodes or relationships; I take the liberty of using hopefully-understandable names for them.
MATCH (m:Movie)
m
--
m1
m2
m3
UNWIND m.languages AS language
m language
-- --------
m1 'en'
m2 'es'
m2 'fr'
m2 'en'
m3 'fr'
m3 'es'
WITH language, collect(m) AS movies
language movies
-------- --------
'en' [m1, m2]
'es' [m2, m3]
'fr' [m2, m3]
MERGE (l:Language {name:language})
language movies l
-------- -------- --
'en' [m1, m2] en
'es' [m2, m3] es
'fr' [m2, m3] fr
WITH l, movies
l movies
-- --------
en [m1, m2]
es [m2, m3]
fr [m2, m3]
UNWIND movies AS m
l movies m
-- -------- --
en [m1, m2] m1
en [m1, m2] m2
es [m2, m3] m2
es [m2, m3] m3
fr [m2, m3] m2
fr [m2, m3] m3
WITH l, m
l m
-- --
en m1
en m2
es m2
es m3
fr m2
fr m3
MERGE (m)-[r:IN_LANGUAGE]->(l);
l m r
-- -- -----
en m1 r1_en
en m2 r2_en
es m2 r2_es
es m3 r3_es
fr m2 r2_fr
fr m3 r3_fr
1
Let’s take this explanation line by line to see where you’re getting lost and see if we can add some clarification. Please comment if any part of the query remains unclear
-
MATCH (m:Movie)
get all the nodes that have theMovie
label -
UNWIND m.languages AS language
the node has a.languages
property that is an array of strings. Each string represents a language that the movie is available in.UNWIND
is basically like proceduralforeach
, where the subsequent statements are interpreted from within the iterator block – so instead of having a list of languages, you get a single language and the following statements distribute over the list. -
WITH language, collect(m) AS movies
select each language string paired with the list of each movie node that is available in the language -
MERGE (l:Language {name:language})
get the node with theLanguage
label. If it doesn’t exist yet, create it. -
WITH l, movies
Select each language node paired with the list of each movie node that is available in the language -
UNWIND movies AS m
foreach movie node in the list of movie nodes that are available in the language represented by the language node -
WITH l,m Select the language node paired with the movie node. This is similar to a procedural nested foreach loop, as in
foreach (l in languages) { foreach (m in movies) { ... } }
, except that only the pairs of languages and movies that are associated by a movie’slanguages
property are considered -
MERGE (m)-[:IN_LANGUAGE]->(l)
get theIN_LANGUAGE
edge between each movie and language where the name of the language is in the movie’slanguages
list* -
;
end the first statement. At this point, you’ve created an edge from each movie to each language in the movie’s languages array -
MATCH (m:Movie)
get every movie node -
SET m.languages = null
delete the languages property, which is reasonable since we now have a relationship to a language node that contains the same information as is in the deleted list property
3