Background:
I am writing a small/partial IDE. Code is internally converted/parsed into a graph data structure (for fast navigation, syntax-check etc). Functionality to undo/redo (also between sessions) and restoring from crash is implemented by writing to and reading from journal. The journal records modifications to the graph (not to the source).
Question:
I am hoping for advice on a decision on data-structures and journal format.
For the graph I see two possible versions:
g-a Graph edges are implemented in the way that one node stores references to other nodes via memory address
g-b Every node has an ID. There is an ID-to-memory-address map. Graph uses IDs (instead of addresses) to connect nodes. Moving along an edge from one node to another each time requires lookup in ID-to-address map.
And also for the journal:
j-a There is a current node (like current working directory in a shell + file-system setting). The journal contains entries like “create new node and connect to current”, “connect first child of current node” (relative IDs)
j-b Journal uses absolute IDs, e.g. “delete edge 7 -> 5”, “delete node 5”
I could e.g. combine g-a with j-a or combine g-b with j-b. In principle also g-b and j-a should be possible.
[My first/original attempt was g-a and a version of j-b that uses addresses, but that turned out to cause severe restrictions: nodes cannot change their addresses (or journal would have to keep track of it), and using journal between two sessions is a mess (or even impossible)]
I wonder if variant a or variant b or a combination would be a good idea, what advantages and disadvantages they would have and especially if some variant might be causing troubles later.
There are 2 common data structures to represent graph(s): Adjacency Lists and Adjacency Matrices. They are listed here with their time/space complexities. Choose the one that’s most appropriate based on the (estimated) complexity/density of your graph and the operations you expect to do on it most.
In my experience, I’ve always preferred a directed adjacency matrix (2D array of bool or float storing the existence or weight of the edge).
- It’s dead simple to write/reason about.
- I’ve never had the need to add/remove vertices after initial construction time.
- The most common operation for my graphs is almost always querying, and that is O(1).
- I haven’t had graphs large enough to worry about space.
That said, they have the same interface, so they’re swappable.
2