Environment
- PostgreSQL 9.1
- Java
Problem
I’ll try to illustrate the problem here as simply as possible:
I’m trying to record the following relationship:
Node 1 => Node 2 => Node 3
Where Node 1 is the parent of Node 2, and Node 2 is the parent of Node 3. In the database table, the initial design is tracking this data in a linked list fashion. However, I am finding this extremely inefficient because each time I want to trace the lineage of say Node 1, I’d have to retrieve Node 2, then with Node 2, I’d retrieve Node 3. Since I would be displaying the lineage of Node 1 to the user, I’d have to make 3 separate calls to the database to display this information.
I’d also like to backtrack, but for the sake of simplicity I left that part out.
Alternatives
Some alternatives I have considered, but I can’t go to at this time are Neo4j and other solutions like it. Unfortunately, at this time, I have to stay in Postgresql.
Question
Is there a design pattern such that I can retrieve all this information at once? If there is, how would you implement with the example above?
UPDATE
With @MichaelT’s comment, and finding Common Table Expressions, I’ve been lead to this link, which seems to provide more information on this:
https://stackoverflow.com/questions/53108/is-it-possible-to-make-a-recursive-sql-query
3
While PostgreSQL’s recursive query feature might be the way to go here, your relationship is also a special case of a directed acyclic graph (DAG). Take a look at https://stackoverflow.com/questions/4074794/simple-graph-search-algorithm-in-sql-postgresql and also consider switching from PostgreSQL to a graph database, like Neo4j.
Another possibility would be to add an additional ID to each record which associates parents, children, grandchildren, etc. into families. Parents and children could share a common family ID, which would be indexed, so retrieving all related records could be done in a single (fast) query. If you include backlinks, then it doesn’t matter where you start from: you can work forward and backward through the result set to order the nodes. You could also bring the entire family into RAM for fast manipulation outside of PostgreSQL. Yes, there would be some additional overhead in maintaining consistent family IDs across CRUD operations, but you might realize a net win in performance.
2
Postgres have ltree data structure. https://www.postgresql.org/docs/9.1/ltree.html
You can store ids of its ancestors to ltree and query based on that.
On insert a node we need to copy ltree of it parent and append it self. Query can use pattern matching.