In this article, Martin Kleppmann claims that using fencing tokens solves the issue of process pauses and uses the following diagram to demonstrate it:
Here we can see that the write of client 1 gets rejected by the Storage service because it has seen another token with a higher value.
However, to me it looks like a race condition could still happen if Client 1 wakes up a bit earlier and sends its write request with token 33 before before Client 2 with token 34. We could arrive in a situation where:
- Storage receives write request with token 33, updates last_see_token to 33.
- Storage node starts writing the value associated with token 33.
- While still writing the value provided by client 1, the Storage node receives the write request from Client 2 with token 34. Since token 34>33, the write is also accepted and we end up with two concurrent writes.
What am I missing? It looks like the Storage node needs its own lock to ensure that writing a value and checking/updating the last_seen_token are one atomic operation. But if we’re doing that, it seems to defeat the purpose of having a lock service in the first place.