Background
I have two separate processes, WriteIt()
and ReadIt()
. One creates records, and the other processes the records in a DB cluster.
Once WriteIt()
creates a record, it queues a ReadIt()
task to process the same record.
To illustrate:
Unfortunately, the database write and replication takes an unreliable amount of time, so ReadIt()
has to keep checking for the presence of the updated record, which seems quite inefficient.
Question
This has got to be a common pattern for distributed systems. So my questions are:
-
Is there a general term (or terms) for this pattern, so that I can read about how to solve it? Unfortunately I don’t even know what the right terminology is so I’ve had a heck of a time doing research on Google/SO/Programmers.SE.
-
(for extra credit) Is there an specific common approach to solving this issue with SQLAlchemy/MySQL and Celery?
I recognize that #2 is pretty specific, so I would be happy with just #1 since I just need to be pointed in the right direction to research the pattern.
2
Specifically yes the name of the pattern in distributed systems is called eventual consistency.
The common approach is to synchronously write the data to an event store and then write to SQL. Your queued background read job can rest assured that once the data is in the event store, it’s a success and won’t be lost.
Usually people use a high-performance storage system that is very fast at handling and serialising a lot of concurrent writes for the event store.
A good approach is often called command-query responsibility separation with event sourcing.
1