I really like scope-based memory management (SBMM), or RAII, as it is more commonly (confusingly?) referred to by the C++ community. As far as I know, except for C++ (and C), there’s no other mainstream language in use today that makes SBMM/RAII their main memory management mechanism, and instead they prefer to use garbage collection (GC).
I find this rather confusing, since
- SBMM makes programs more deterministic (you can tell exactly when an object is destroyed);
- in languages that use GC you often have to do manual resource management (see closing files in Java, for example), which partly defeats the purpose of GC and is also error prone;
- heap memory can also (very elegantly, imo) be scope-bound (see
std::shared_ptr
in C++).
Why is not SBMM more widely used? What are its disadvantages?
8
Let’s start by postulating that memory is by far (dozens, hundreds or even thousands of time) more common than all other resources combined. Every single variable, object, object member needs some memory allocated to it and freed later on. For every file you open, you create dozens to millions of objects to store the data pulled out of the file. Every TCP stream goes together with an unbounded number of temporary byte strings created to be written to the stream. Are we on the same page here? Great.
For RAII to work (even if you have ready-made smart pointers for every use case under the sun), you need to get ownership right. You need to analyse who should own this or that object, who should not, and when ownership should be transferred from A to B. Sure, you could use shared ownership for everything, but then you’d be emulating a GC via smart pointers. At that point it becomes much easier and faster to build the GC into the language.
Garbage collection frees you from this concern for the by far most commonly used resource, memory. Sure, you still need to make the same decision for other resources, but those are far less common (see above), and complicated (e.g. shared) ownership is less common too. The mental burden is reduced significantly.
Now, you name some downsides to making all values garbage collected. However, integrating both memory-safe GC and value types with RAII into one language is extremely hard, so perhaps it’s better to migitate these trade offs via other means?
The loss of determinism in turns out to be not that bad in practice, because it only affects deterministic object lifetime. As described in the next paragraph, most resources (aside from memory, which is plentiful and can be recycled rather lazily) are not bound to object lifetime in these languages. There are a few other uses cases, but they are rare in my experience.
Your second point, manual resource management, is nowadays addressed via a statement that does perform scope-based cleanup, but does not couple this clean up to the object life time (hence not interacting with the GC and memory safety). This is using
in C#, with
in Python, try
-with-resources in recent Java versions.
16
RAII also follows from automatic reference-counting memory management, e.g. as used by Perl. While reference counting is easy to implement, deterministic, and quite performant, it cannot deal with circular references (they cause a leak) which is why it isn’t commonly used.
Garbage-collected languages can’t use RAII directly, but often do offer syntax with an equivalent effect. In Java, we have the try-with-ressource statement
try (BufferedReader br = new BufferedReader(new FileReader(path))) { ... }
which automatically calls .close()
on the resource on block exit. C# has the IDisposable
interface, which allows .Dispose()
to be called when leaving an using (...) { ... }
statement. Python has the with
statement:
with open(filename) as f:
...
which works in a similar fashion. In an interesting spin on this, Ruby’s file open method gets a callback. After the callback has been executed, the file is closed.
File.open(name, mode) do |f|
...
end
I think Node.js uses the same strategy.
7
In my opinion the most convincing advantage of garbage collection is that it allows for composability. Correctness of memory management is a local property in garbage collected environment. You can look at each part in isolation and determine if it can leak memory. Combine any number of memory-correct parts and they stay correct.
When you rely on reference counting you lose that property. Whether your application can leak memory becomes a global property of the whole application with reference counting. Every new interaction between parts has the possibility to use the wrong ownership and break memory management.
It has a very visible effect on the design of programs in the different languages. Programs in GC-languages tend to be a bit more soups of objects with lots of interactions, while in GC-less languages one tends to prefer structured parts with strictly controlled and limited interactions between them.
7
Closures are an essential feature of pretty much all modern languages. They are very easy to implement with GC and very hard (though not impossible) to get right with RAII, since one of their main features is that they allow you to abstract over the lifetime of your variables!
C++ only got them 40 years after everybody else did, and it took a lot of hard work by a lot of smart people to get them right. In contrast, many scripting languages designed and implemented by people with zero knowledge in designing and implementing programming languages have them.
4
- SBMM makes programs more deterministic (you can tell exactly when an object is destroyed);
For most programmers the OS is non-deterministic, their memory allocator is non-deterministic and most of the programs they write are concurrent and, therefore, inherently non-deterministic. Adding the constraint that a destructor is called exactly at the end of scope rather than slightly before or slightly after is not a significant practical benefit for the vast majority of programmers.
- in languages that use GC you often have to do manual resource management (see closing files in Java, for example), which partly defeats the purpose of GC and is also error prone;
See using
in C# and use
in F#.
- heap memory can also (very elegantly, imo) be scope-bound (see std::shared_ptr in C++).
In other words, you could take the heap which is a general purpose solution and change it to only work in a specific case that is seriously limiting. That is true, of course, but useless.
Why is not SBMM more widely used? What are its disadvantages?
SBMM limits what you can do:
-
SBMM creates the upward funarg problem with first-class lexical closures which is why closures are popular and easy to use in languages like C# but rare and tricky in C++. Note that there is a general trend towards the use of functional constructs in programming.
-
SBMM requires destructors and they impede tail calls by adding more work to do before a function can return. Tail calls are useful for extensible state machines and are provided by things like .NET.
-
Some data structures and algorithms are notoriously difficult to implement using SBMM. Basically anywhere that cycles are naturally occurring. Most notably graph algorithms. You effectively end up writing your own GC.
-
Concurrent programming is harder because control flow and, therefore, object lifetimes are inherently non-deterministic here. Practical solutions in message passing systems tend to be deep copying of messages and the use of excessively long lifetimes.
-
SBMM keeps objects alive until the end of their scope in the source code which is often longer than necessary and can be far longer than necessary. This increases the amount of floating garbage (unreachable objects waiting to be recycled). In contrast, tracing garbage collection tends to free objects soon after the last reference to them disappears which can be much sooner. See Memory management myths: promptness.
SBMM is so limiting that programmers need an escape route for situations where lifetimes can not be made to nest. In C++, shared_ptr
offers an escape route but it can be ~10x slower than tracing garbage collection. So using SBMM instead of GC would put most people wrong footed most of the time. That is not to say, however, that it is useless. SBMM is still of value in the context of systems and embedded programming where resources are limited.
FWIW you might like to check out Forth and Ada, and read up on the work of Nicolas Wirth.
10
Looking at some popularity index like TIOBE (which is arguable, of course, but I guess for your kind of question its ok to use this) , you first see that ~50% of the top 20 are “scripting languages” or “SQL dialects”, where the “ease of use” and means of abstraction have a much more importance than deterministic behaviour. From the remaining “compiled” languages, there are around 50% of the languages with SBMM and ~50% without. So when taking the scripting languages out of your calculation, I would say your assumption is just wrong, among compiled languages the ones with SBMM are as popular as the ones without.
4
One major advantage of a GC system which nobody has mentioned yet is that a reference in a GC system is guaranteed to retain its identity as long as it exists. If one calls IDisposable.Dispose
(.NET) or AutoCloseable.Close
(Java) on an object while copies of the reference exist, those copies will continue to refer to the same object. The object won’t be useful for anything anymore, but attempts to use it will have predictable behavior controlled by the object itself. By contrast, in C++, if code calls delete
on an object and later tries to use it, the entire state of the system becomes totally undefined.
Another important thing to note is that scope-based memory management works very well for objects with clearly-defined ownership. It works much less well, and sometimes downright badly, with objects that have no defined ownership. In general, mutable objects should have owners, while immutable objects don’t need to, but there’s a wrinkle: it’s very common for code to use an instance of a mutable types to hold immutable data, by ensuring that no reference will be exposed to code that might mutate the instance. In such a scenario, instances of the mutable class might be shared among multiple immutable objects, and thus have no clear ownership.
8
First off, its very important to realize that equating RAII to SBMM. or even to SBRM. One of the most essential (and least known or most under appreciated) qualities of RAII is the fact that it makes ‘being a resource’ a property that is NOT transitive to composition.
The following blog post discusses this important aspect of RAII and contrasts it to resource amangement in GCed languages that use non-deterministic GC.
Its important to note that while RAII is mostly used in C++, Python (at last the non-VM based version) has destructors and deterministic GC that allows RAII to be used together with GC. Best of both worlds if it were.
16