We have a very large graph stored in Neptune (~26 million rels, ~20 million nodes) and need to find all the directed cycles between nodes with specific labels, following only relationships with certain properties. There are many labels for the nodes; for this task we only care about nodes with the FOO
label. There are several rel types; we only care about rels of type type_a
. There are many properties on each rel; we only care about rels with the property category = "1"
.
The network seems too large for simply searching for the cycles directly, even if we ignore the required property.
MATCH p=(n:FOO)-[r:type_a*]->(n)
RETURN p
This command always times out (unless I limit it). It also does not filter by rel property.
My instinct is to try and filter the graph to greatly reduce its size and improve the search. I can isolate all the node1-rel-node2 paths easily with the following:
MATCH p=(n:FOO)-[r:type_a]->(m:FOO)
WHERE r.category = "1"
RETURN nodes(p), relationships(p)
But I dont know how to then project this list of nodes and relationships to a new, separate graph in an efficient way (i.e. without saving to a CSV and creating a new Neptune instance).
I believe I can chain commands together using WITH
but I again run into the issue of not knowing how to handle a list of relationships where I want every relationship to have a certain property. The closest I can get is the following, which does not seem to work:
MATCH (n:FOO)-[r:type_a]->(m:foo)
WHERE r.category = "1"
WITH n, r
MATCH p = (n)-[r*]->(n)
RETURN p
I have tried several variations on this. This version gives a error declaring that r
is already defined. An attempt to redefine r
tells me that r
is not a MAP
type but a LIST<relationship>
type. I understand the error, but dont know how to resolve it.
Sadly, using graph networks is new to my organisation. We cannot use Neo4j or Memgraph. I have been working with graphs and Cypher for only a month or so. I am now the most knowledgeable person in my organisation and therefore cannot ask for help.
From this, I have several main questions:
- Knowing that my graph is very large, and that we require the main graph to remain “whole”, is retrieving a subset of just the nodes and edges I require and projecting to a new graph the best method for success – if so, how do I achieve the projection?
- Regardless of the answer to 1, how would I achieve a property match on all elements of a list of relationships obtained from a path query asking for any number of relationships between two nodes?
- Is there a better way to achieve all the (unique) cycles with the required labels, types and properties in Neptune? I know Neo4j has the
apoc
library and I was able to achieve my goals withapoc.nodes.cycles
before we encountered a situation that required us to stop using Neo4j (not a performance based one, a security/cost based one), but I dont believe there is a similar library for openCypher and Neptune. Would a Gremlin query be better for this? What would such a query look like?