I am practicing using of immutable object in C++. My personal goal is representing generic object graph (in heap) with sequence of immutable graphs.
Building the multi-version graph itself isn’t that much hard. The problem is performance. Brute-force versioning needs full copy of graph, and this was not acceptable.
I tried to share unchanged nodes. But in this case, I got a new problem; references. Reference to other object must be updated in whole graph. This needs visiting all nodes for each time I derive a new graph version. And this mutates the nodes with references, so they also should be derived (by copying). Performance won’t be better much than brute-force copying.
As far as I can imagine, there’s no real efficient way to represent mutation of object graph with immutable states. So I am asking for some idea on this.
Is it possible to represent mutation of object graph efficiently with immutable state?
7
What you are looking for is called a Persistent Data Structure. The canonical resource for persistent data structures is Chris Okasaki’s Book Purely Functional Data Structures. Persistent data structures have gathered interest in recent times due to their popularization in Clojure and Scala.
However, for some strange reason, Persistent Graphs seem to be mostly ignored. We have lists, dozens of different kinds of trees, arrays, priority queues, maps, but no graphs.
I really only found one paper: Fully Persistent Graphs – Which One To Choose?
If you don’t consider connections between the objects to be part of your versioned resource (and you might — in which case the following probably doesn’t help much), you could consider splitting your objects into a part that represents the connectivity portion of the object and a part that represents the immutable state.
Then you could have each of the connectivity sub-objects contain a vector of the versioned states. This way you could operate with a graph version number to access the appropriate immutable state.
In order to avoid having to traverse the entire graph whenever there is an update to a specific node, you can make it so that if a node is accessed with a version number than is greater than the node’s current version number, the current version is used. If the node is then updated, you fill in all the intermediate versions with the previous version — thus allowing you to do lazy-updates to the object graph.
If the connectivity between objects is part of your versioned state, then the foregoing does not work. But maybe you can extend it as follows:
For each object in the graph, create a “handle object”. The handle object contains the list of versioned immutable states. Rather than storing object references in any of the objects of the graph, you would store a reference to the handle object. Then each references to the objects would be de-rereferenced through the handle using the handle and the version number of view of the object graph that was currently being processed. This would return the correct immutable state for the object. The immutable states use the handles for referring to other objects in the graph, so you always get the consistent date for the version of the graph you want to process.
The same logic described above applies for updating the versions within the handles — which allows lazy, localized updates.
There is a published solution to this problem with very good amortized time complexity, but its hard to find when you don’t know exactly what to look for. You can find a nice summary in these lecture notes (check section 2.2.3) or read the original paper Making Data Structures Persistent. If your object graph has limited connectivity (e.g. tree-like structures) you can even reach amortized O(1) complexity for both reading and updating the immutable graph, which is impressive.
The idea is that each object, besides storing its current immutable state, reserves space for recording changes. When you want to create a new immutable graph from an existing one by applying changes you have to consider two cases:
-
The object you want to change has still space for changes. You can store the changes directly in the object.
-
The object has no more space for changes. You create new instance based on the current values and empty change records. Now you need to recursively update all objects referencing the old object to reference the new object.
If the referencing object itself has still space for changes you can store the change in reference directly, otherwise it cascades recursively.
While this scheme supports efficient creation of new generations of immutable object graphs it complicates reading from it, because now you need to specify which “version” you want to access when reading data from an immutable object. This is because the immutable object may have information for multiple versions due to the stored change records, so you need to specify which version you want to look at.
When implementing this I try to hide this complexity in the API. When referencing something in the immutable graph from the outside, I use generated stubs instead of direct references, so callers don’t need to keep passing the desired version around manually. This makes references slightly more expensive than a direct pointer, but I find it worth the convenience.