I am implementing a message queue where messages are distributed across nodes in a cluster. The goal is to design a system to be able to auto-scale without needing to keep a global map of each message and its location.
My question is: when I add a new node, I want to rebalance messages. How do I handle requests that come in after the node has been added, but before data has been copied? This problem seems to exist regardless of the hashing algorithm I use.
Take this sequence of events:
- There is a single node, A, with two messages, M1 and M2
- Add new node, B. M2 should be moved to B.
- The user attempts to delete M2. However, this fails because message is not yet found in B.
Ideally, I would route the request to node A while B is still being populated. But then B would have a stale state of data was modified.
How can I solve this problem of hot-swapping data between nodes without going as far as keeping a centralized database of all messages?