Part 1
Clearly Immutability minimizes the need for locks in multi-processor programming, but does it eliminate that need, or are there instances where immutability alone is not enough? It seems to me that you can only defer processing and encapsulate state so long before most programs have to actually DO something (update a data store, produce a report, throw an exception, etc.). Can such actions always be done without locks? Does the mere action of throwing out each object and creating a new one instead of changing the original (a crude view of immutability) provide absolute protection from inter-process contention, or are there corner cases which still require locking?
I know a lot of functional programmers and mathematicians like to talk about “no side effects” but in the “real world” everything has a side effect, even if it’s the time it takes to execute a machine instruction. I’m interested in both the theoretical/academic answer and the practical/real-world answer.
If immutability is safe, given certain bounds or assumptions, I want to know what the borders of the “safety zone” are exactly. Some examples of possible boundaries:
- I/O
- Exceptions/errors
- Interactions with programs written in other languages
- Interactions with other machines (physical, virtual, or theoretical)
Special thanks to @JimmaHoffa for his comment which started this question!
Part 2
Multi-processor programming is often used as an optimization technique – to make some code run faster. When is it faster to use locks vs. immutable objects?
Given the limits set out in Amdahl’s Law, when can you achieve better over-all performance (with or without the garbage collector taken into account) with immutable objects vs. locking mutable ones?
Summary
I’m combining these two questions into one to try to get at where the bounding box is for Immutability as a solution to threading problems.
15
This is an oddly phrased question that is really, really broad if answered fully. I’m going to focus on clearing up some of the specifics that you’re asking about.
Immutability is a design trade off. It makes some operations harder (modifying state in large objects quickly, building objects piecemeal, keeping a running state, etc.) in favor of others (easier debugging, easier reasoning about program behavior, not having to worry about things changing underneath you when working concurrently, etc.). It’s this last one we care about with this question, but I want to emphasize that it is a tool. A good tool that often solves more problems than it causes (in most modern programs), but not a silver bullet… Not something that changes the intrinsic behavior of programs.
Now, what does it get you? Immutability gets you one thing: you can read the immutable object freely, without worrying about its state changing underneath you (assuming it is truly deeply immutable… Having an immutable object with mutable members is usually a deal breaker). That’s it. It frees you from having to manage concurrency (via locks, snapshots, data partitioning or other mechanisms; the original question’s focus on locks is… Incorrect given the scope of the question).
It turns out though that lots of things read objects. IO does, but IO itself tends to not handle concurrent use itself well. Almost all processing does, but other objects may be mutable, or the processing itself might use state that is not friendly to concurrency. Copying an object is a big hidden trouble point in some languages since a full copy is (almost) never an atomic operation. This is where immutable objects help you.
As for performance, it depends on your app. Locks are (usually) heavy. Other concurrency management mechanisms are faster but have a high impact on your design. In general, a highly concurrent design that makes use of immutable objects (and avoids their weaknesses) will perform better than a highly concurrent design that locks mutable objects. If your program is lightly concurrent then it depends and/or doesn’t matter.
But performance should not be your highest concern. Writing concurrent programs is hard. Debugging concurrent programs is hard. Immutable objects help improve your program’s quality by eliminating opportunities for error implementing concurrency management manually. They make debugging easier because you’re not trying to track state in a concurrent program. They make your design simpler and thus remove bugs there.
So to sum up: immutability helps but will not eliminate challenges needed to handle concurrency properly. That help tends to be pervasive, but the biggest gains are from a quality perspective rather than performance. And no, immutability does not magically excuse you from managing concurrency in your app, sorry.
5
A function that accepts some value and returns some other value, and doesn’t disturb anything outside of the function, has no side effects, and is therefore thread-safe. If you want to consider things like how the way the function executes affects power consumption, that’s a different problem.
I am assuming that you’re referring to a Turing-complete machine that is executing some sort of well-defined programming language, where the implementation details are irrelevant. In other words, it shouldn’t matter what the stack is doing, if the function I’m writing in my programming language of choice can guarantee immutability within the confines of the language. I don’t think about the stack when I’m programming in a high-level language, nor should I have to.
To illustrate how this works, I’m going to offer a few simple examples in C#. In order for these examples to be true, we have to make a couple of assumptions. First, that the compiler follows the C# specification without error, and second, that it produces correct programs.
Let’s say I want a simple function that accepts a string collection, and returns a string that is a concatenation of all of the strings in the collection separated by commas. A simple, naïve implementation in C# might look like this:
public string ConcatenateWithCommas(ImmutableList<string> list)
{
string result = string.Empty;
bool isFirst = false;
foreach (string s in list)
{
if (isFirst)
result += s;
else
result += ", " + s;
}
return result;
}
This example is immutable, prima facie. How do I know that? Because the string
object is immutable. However, the implementation is not ideal. Because result
is immutable, a new string object has to be created each time through the loop, replacing the original object that result
points to. This can negatively affect speed and put pressure on the garbage collector, since it has to clean up all of those extra strings.
Now, let’s say I do this:
public string ConcatenateWithCommas(ImmutableList<string> list)
{
var result = new StringBuilder();
bool isFirst = false;
foreach (string s in list)
{
if (isFirst)
result.Append(s);
else
result.Append(", " + s);
}
return result.ToString();
}
Notice that I’ve replaced string
result
with a mutable object, StringBuilder
. This is much faster than the first example, because a new string is not created each time through the loop. Instead, the StringBuilder object merely adds the characters from each string to a collection of characters, and outputs the whole thing at the end.
Is this function immutable, even though StringBuilder is mutable?
Yes, it is. Why? Because each time this function is called, a new StringBuilder is created, just for that call. So now we have a pure function that is thread-safe, but contains mutable components.
But what if I did this?
public class Concatenate
{
private StringBuilder result = new StringBuilder();
bool isFirst = false;
public string ConcatenateWithCommas(ImmutableList<string> list)
{
foreach (string s in list)
{
if (isFirst)
result.Append(s);
else
result.Append(", " + s);
}
return result.ToString();
}
}
Is this method thread-safe? No, it isn’t. Why? Because the class is now holding state on which my method depends. A race condition is now present in the method: one thread may modify IsFirst
, but another thread may perform the first Append()
, in which case I now have a comma at the beginning of my string which is not supposed to be there.
Why might I want to do it like this? Well, I might want the threads to accumulate the strings into my result
without regard to order, or in the order that the threads come in. Maybe it’s a logger, who knows?
Anyway, to fix it, I put a lock
statement around the method’s innards.
public class Concatenate
{
private StringBuilder result = new StringBuilder();
bool isFirst = false;
private static object locker = new object();
public string AppendWithCommas(ImmutableList<string> list)
{
lock (locker)
{
foreach (string s in list)
{
if (isFirst)
result.Append(s);
else
result.Append(", " + s);
}
return result.ToString();
}
}
}
Now it’s thread-safe again.
The only way that my immutable methods could possibly fail to be thread-safe is if the method somehow leaks part of its implementation. Could this happen? Not if the compiler is correct and the program is correct. Will I ever need locks on such methods? No.
For an example of how implementation could possibly be leaked in a concurrency scenario, see here.
12
I’m unsure if I understood your questions.
IMHO the answer is yes. If all your objects are immutable, then you don’t need any locks. But if you need to preserve a state (e.g. you implement a database or you need to aggregate the results from multiple threads) then you need to use mutability and therefore also locks. Immutability eliminates the need for locks, but usually you cannot afford to have completely immutable applications.
Answer to part 2 – locks should be always slower than no locks.
1
Encapsulating a bunch of related state in a single mutable reference to an immutable object can make it possible for many kinds of state modification to be performed lock-free using the pattern:
do
{
oldState = someObject.State;
newState = oldState.WithSomeChanges();
} while (Interlocked.CompareExchange(ref someObject.State, newState, oldState) != oldState;
If two threads both try to update someObject.state
simultaneously, both objects will read the old state and determine what the new state would be without each other’s changes. The first thread to execute the CompareExchange will store what it thinks the next state should be. The second thread will find that the state no longer matches what it had previously read, and will thus re-compute the proper next state of the system with the first thread’s changes taken into effect.
This pattern has the advantage that a thread which gets waylaid cannot block the progress of other threads. It has the further advantage that even when there is heavy contention, some thread will always be making progress. It has the disadvantage, though, that in the presence of contention a lot of threads may spend a lot of time doing work which they will end up discarding. For example, if 30 threads on separate CPUs all try to change an object simultaneously, the one will succeed on its first attempt, one on its second, one on its third, etc. so that each thread ends up on average making about 15 attempts to update its data. Using an “advisory” lock can improve things significantly: before a thread attempts an update, it should check whether a “contention” indicator is set. If so, it should acquire a lock before doing the update. If a thread makes a few unsuccessful attempts at an update, it should set the contention flag. If a thread which tries to acquire the lock finds there was nobody else waiting, it should clear the contention flag. Note that the lock here is not required for “correctness”; code would work correctly even without it. The purpose of the lock is to minimize the amount of time code spends on operations which are not likely to succeed.
You start with
Clearly Immutability minimizes the need for locks in multi-processor programming
Wrong. You need to carefully read the documentation for every class that you use. For example, const std::string in C++ is not thread safe. Immutable objects can have internal state that changes when accessing them.
But you are looking at this from a totally wrong point of view. It doesn’t matter whether an object is immutable or not, what matters is whether you change it. What you are saying is like saying “if you never take a driving test, you can never lose your driving license for drunk driving”. True, but rather missing the point.
Now in the example code someone wrote with a function named “ConcatenateWithCommas”: If the input was mutable and you used a lock, what would you gain? If someone else tries to modify the list while you try to concatenate the strings, a lock can keep you from crashing. But you still don’t know whether you concatenate the strings before or after the other thread changed them. So your result is rather useless. You have a problem that isn’t related to locking and cannot be fixed with locking. But then if you use immutable objects, and the other thread replaces the whole object with a new one, you are using the old object and not the new object, so your result is useless. You have to think about these problems at an actual functional level.
1