I’ve been working with akka for 7-8 months now daily.
When I started, I would be working on applications and notice that actors would be used basically anywhere once inside the actor system for communicating between most objects. So I did the same – spin up another actor for x/y/z.
It seems to me that this may be too indiscriminate, adding complexity where it isn’t needed – but I can’t find any discussions on where actors vs plain synchronous or even async logic via futures should be used. I started pondering my stance after my coworker mentioned something similar. I realized several cases more recently where I have pondered a task and then avoided creating another actor because I could achieve the same result safely in an immutable implementation – eg something like getting configuration values from a db or file somewhere where you access very infrequently and will wait for the result is the actual use case.
In particular, it seems to me that any case where you’re playing with immutable state, actors create complexity and limit throughput – a pure function in an object, for example, can be called concurrently with no risk with any level of concurrency, yet an actor can only process one message at a time. The alternate consideration is you’ll park the thread if you need to wait for the result unless you start using futures but cases where you don’t need to worry about async messaging or scale it seems it may be overkill to employ an actor.
So my question is – is there a bad time to use actors? I’m curious how erlang looks and would really like other people’s insight. Or if there are some principles around actor use.
5
This is a question I am interested in and I have been doing some research on. For other viewpoints, see this blog post by Noel Walsh or this question on Stack Overflow. I have some opinions I would like to offer:
- I think Akka, because it works with messages, encourages a “push mindset”. Often, for concurrency, I would argue this is not what you want. Pull is much safer. For example one common pattern for distributed systems is to have a set of workers processing information in a queue. Obviously this is possible in Akka but it doesn’t necessarily seem to be the first approach people try. Akka also offers durable mailboxes but again it depends how you use them – a single shared queue is a lot more flexible than per worker queues for balancing / re-assigning work.
- It’s easy to get into the mindset of replacing your classes with actors. In fact some people even seem to advocate it by saying actors should only do one thing. Taken to it’s logical conclusion this increases code complexity as Jason describes, because if every class is an actor, that’s a lot of extra messages and receive / send blocks. It also makes it harder to understand and test the code because you lose the formality of interfaces – and I am not convinced that comments are a solution to this. Also despite Akka’s legendary efficiency, I suspect that actor proliferation is not a good idea performance wise – when we use Java threads we know they are precious and conserve them accordingly.
- It’s related to the previous point, but another annoyance is the loss of type information that Noel and Pino highlight, as for many of us that’s why we are using Scala rather than other languages such as Python. There are some ways around this but they are either non-standard, not recommended or experimental.
- Finally concurrency, even if you have a “let it crash” mindset, is hard. Alternative programming models can help but they don’t make the problems go away – they change them – that’s why it’s good to think about them formally. It’s also why Joe Average developer reaches for ready built tools like RabbitMQ, Storm, Hadoop, Spark, Kafka or NoSQL databases. Akka does have some prebuilt tools and components, which is cool, but it also feels quite low level, so more ready built common elements of distributed systems would help developers and ensure systems are built right.
Like Jason, I am keen to hear other people’s insight here. How can I address some of the issues above and use Akka better?
7
It’s worth considering what the actor model is used for: the actor model is
- a concurrency model
- that avoids concurrent access to mutable state
- using asynchronous communications mechanisms to provide concurrency.
This is valuable because using shared state from multiple threads gets really hard, especially when there are relationships among different components of the shared state that must be kept synchronized. However, if you have domain components in which:
- You don’t allow concurrency, OR
- You don’t allow mutable state (as in functional programming), OR
- You must rely on some synchronous communications mechanism,
then the actor model will not provide much (if any) benefit.
Hope that helps.
4
Your intuition is correct, IMHO. Using actors everywhere is like having the proverbial hammer and seeing only nails.
The Erlang best practice is to use processes/actors for all activities that happen concurrently. That is, just like in real life. Sometimes it is difficult to find the right granularity, but most of the times you just know by looking at the modeled domain and using a bit of common sense. I’m afraid I don’t have a better method than that, but I hope it helps.
1
In order input/output messaging:
I recently met with an akka-based application where the actor model actually caused concurrency problems, a simpler model would have sufficed better under load.
The problem was that the incoming messages were moving in different ‘lanes’ (through different actor paths) but the code assumed the messages would arrive at their final destination in the same order they arrived. As long as data arrived with sufficiently large intervals this worked because there would be only a single conflicting message racing to the destination. When the intervals decreased they started arriving out of order and causing weird behavior.
The problem could have been solved correctly with a little less actors, but it’s an easy mistake to make when overusing them.
2
In my opinion there are two use cases for Actors. Shared resources such as ports and the like, and large state. The first has been covered well by the discussion so far, but large state is also a valid reason.
A large structure being passed with every procedure call can use a lot of stack. This state can be put into a separate process, the structure replaced by a process id, and that process queried on an as required basis.
Databases such as mnesia can be thought of as storing state externally to the querying process.
2