There’s an entity that gets updated from external sources. Update events are at random intervals. And the entity has to be processed once updated. Multiple updates may be multiplexed. In other words there’s a need for the most current state of entity to be processed.
There’s a point of no-return during processing where the current state (and the state is consistent i.e. no partial update is made) of entity is saved somewhere else and processing goes on independently of any arriving updates.
Every consequent set of updates has to trigger processing i.e. system should not forget about updates. And for each entity there should be no more than one running processing (before the point of no-return) i.e. the entity state should not be processed more than once.
So what I’m looking for is a pattern to cancel current processing before the point of no return or abandon processing results if an update arrives. The main challenge is to minimize race conditions and maintain integrity.
The entity sits mainly in database with some files on disk. And the system is in .NET with web-services and message queues.
What comes to my mind is a database queue-like table. An arriving update inserts row in that table and the processing is launched. The processing gathers necessary data before the point of no-return and once it reaches this barrier it looks into the queue table and checks whether there’re more recent updates for the entity. If there are new updates the processing simply shuts down and its data is discarded. Otherwise the processing data is persisted and it goes beyond the point of no-return.
Though it looks like a solution to me it is not quite elegant and I believe this scenario may be supported by some sort of middleware.
If I would use message queues for this then there’s a need to access the queue API in the point of no-return to check for the existence of new messages. And this approach also lacks elegance.
Is there a name for this pattern and an existing solution?
4
I would separate the “point-of-no-return” processor from the pre-setup piece. One service picks up the updates and does whatever setup to get ready for offline processing, and then before handing it over to the offline processor just checks either the database or if it’s on a single machine you could use signals, in .NET EventWaitHandles named for the entity ID to see if any new updates came in. If so, the pre-processor just goes back to start with the new updates pulled in as it gets everything ready again for the offline processor. Each time it gets to offline processing point it does this check.
6
The processing gathers necessary data before the point of no-return and once it reaches this barrier it looks into the queue table and checks whether there’re more recent updates for the entity. If there are new updates the processing simply shuts down and its data is discarded. Otherwise the processing data is persisted and it goes beyond the point of no-return.
Depending on the frequency of updates received, the system can enter on periods of starvation – where the just-processed updates are discarded continuously because new ones are being received.
Instead of throwing away the computations, you can just keep an stack of the outputs generated.
Take a look at LMAX Architecture: http://martinfowler.com/articles/lmax.html
1
Assuming that you’re okay with limiting your data repository options to Microsoft SQL Server, you could opt to go with using Service Broker to handle your messaging & queuing.
Since this would all be encapsulated within the database engine, there wouldn’t need to be any sort of external API calls from the point of no-return check either. All of the logic could be written into stored procedures (either with T-SQL or as CLR procedures). Also, with Service Broker Activation, you can have other programs (e.g. your own executable) run on-demand, whenever there’s work for it to do.
As an added bonus, Service Broker is easily scalable, so you can offload any data processing to another server(s) & keep your primary database server from being bogged down by any irregular data processing loads – just setup your multiple instances & point Service Broker to the appropriate endpoints.
However, some of the drawbacks of using Service Broker are:
- Additional learning curve for developers (learning a new technology).
- More complicated to troubleshoot for any non-DBA type tech support.
- Requires Standard Edition or higher (with SQL Server 2012) – not available with the Express Edition (i.e. the free version).
1
I was able to implement the pattern largely following Jimmy Hoffa’s advice and ideas.
The pattern works for us for several months already and it works as follows.
- Every time the Entity is updated we insert new row to the EntityRevision table. This table has autoincrement (identity) RevisionId field which we pass along to the Pre-Processing.
- During Pre-Processing we extract the state of Entity associated with the RevisionId being processed and work with it. The point here is that we avoid using any queries that return current Entity state because it’s constantly changing as Entity is updated. In contrast the state associated with the Revision never changes and it’s safe to use it during concurrent updates.
-
The Entity’s RevisionId is sent to offline processing Queue after Pre-Processing. We use the following rules.
Say arrived RevisionId is equal to Xa) If there’s no other Revision for the same Entity in the Queue, we add X to the Queue
b) If there’s already RevisionId which is Y for the same Entity in the Queue:- if Y > X, we discard X. In other words we don’t need earlier Revision X to be processed offline.
- if X > Y, we remove Y from Queue and X to Queue. Since Y is earlier than X, we replace it with X.
We make operations on the Queue to perform in serial order so that at most one process/thread alters the queue at any time. For now we use Mutex in the Queue operations.
-
The Offline Processing polls the Queue on a periodical schedule and does what it needs to do with Revisions in the Queue. Again we use here only state of Entity associated with its Revision.
While we guarantee that there’s the current Revision in the Queue pre-processed so far, we are still not protected from the same Entity being updated or update being pre-processed during or right after the ongoing Offline Processing. In such case an update will have to wait in the Queue for the next Offline Processing launch. But that’s ok, we can’t prohibit updates 🙂 In fact those updates are the essence of the system.