In mark-sweep-compact garbage collection algorithm you have to stop-the-world when relocating objects because reference graph becomes inconsistent and you have to replace values of all references pointing to the object.
But what if you had a hash table with object ID as a key and pointer as value, and references would point to said ID instead of object address… then fixing references would only require changing one value and pause would only be needed if object is tried to be written into during copying…
Is there a mistake in my line of thought?
Updating references is not the only thing that requires a pause. The standard algorithms commonly grouped under “mark-sweep” all assume that the entire object graph remains unaltered while it’s being marked. Correctly handling modifications (new objects created, references changed) requires rather tricky alternative algorithms, like as the tri-color algorithm. The umbrella term is “concurrent garbage collection”.
But yes, updating references after compaction also needs a pause. And yes, using indirection (e.g. via a persistent object ID and a hash table to real pointers) can greatly reduce the pausing. It might even be possible to make this part lock-free if one so desires. It would still be as tricky to get right as any low-level shared-memory concurrency, but there is no fundamental reason it wouldn’t work.
However, it would have severe disadvantages. Aside from taking extra space (at least two extra words for all objects), it makes every dereference much more expensive. Even something as simple as getting an attribute now involves a full hash table search. I’d estimate the performance hit to be way worse than for incremental tracing.
5
All problems in computer science can be solved by another level of indirection … except for the problem of too many layers of indirection
Your approach does not immediately solve the problem of garbage collection, but only moves it up one level. And at what cost! Now, every memory access goes through another pointer dereference. We can’t cache the result location, because it might have been relocated in the meanwhile, we must always go through the object ID. In most systems, this indirection is not acceptable, and stopping the world is assumed to have a lower total runtime cost.
I said your proposition only moves the problem, not solves it. The issue is around the reuse of object IDs. The object IDs are now our equivalent of pointers, and there is only a finite amount of addresses. It is conceivable (esp. on a 32 bit system) that during the lifetime of your program, more than INT_MAX objects will have been created, e.g. in a loop like
while (true) {
Object garbage = new Object();
}
If we just increment the object ID for each object, we will run out of IDs at some point. Therefore we have to find out which IDs are still in use and which are free so that they can be reclaimed. Sound familiar? We are now back at square one.
5
There is no error in your line of thought, you’ve just described something very close to how the original Java garbage collector worked
The original Java virtual machine [6] and some Smalltalk virtual machines use indirect pointers, called handles in [6],to refer to objects. Handles allow easy relocation of objects during garbage collection since, with handles, there isonly one direct pointer to each object: the one in its handle. All other references to the object indirect through the han-dle. In such handle-based memory systems, while object addresses change over the lifetime of objects and therefore cannot be used for hashing, handle addresses remain constant.
Space and Time-Efficient Hashing of Garbage-Collected Objects
In Sun’s current implementation of the Java Virtual Machine, a reference to
a class instance is a pointer to a handle that is itself a pair of pointers: one to a table
containing the methods of the object and a pointer to the Class object that represents
the type of the object, and the other to the memory allocated from the Java
heap for the object data.The Java Virtual Machine Specification (1997)
So it does work, it has been tried, and its inefficiency led to the development of generational mark and sweep systems.
2