We have a system (ms stack, .net, sql) that receives data from thousands of remote devices (several independent readings/min). We currently save all the data to a db as it arrives and a second service reads and processes the data – using the database to ‘buffer’ the input.
We want the second process to be scaleable as it can take quite a while to complete. Ideally we should be able to run multiple instances of the second process but we have to make sure that the data is processed in a specific order (cannot have two processes working on data from the same remote device at the same time). We’ve been using the database to manage this by reading all the data from one device at a time and preventing other processes from reading data from that device until its complete.
This is stating to show performance problems and has high db traffic so we are looking for alternative architectures to using the DB as a buffer.
Can object caching systems like memcache allow us to retrieve all data for 1 device into one process and prevent that data being used by another process?
Or are there message queuing systems that would do this?
Or something else?
-EDIT-
Reading the data back to be processed needs to be done by machines on different servers so I’m looking for something that can cross application / process boundaries and maintain locks on some data items to preserve the order they are processed
2
The problem with “a database” is that it is generally a single-point of failure/scalability. You can scale databases, but it’s difficult and costly depending on what scale you’re talking about.
Typically, what you describe are implemented as messages. Message queues would then store and forward the messages to multiple readers or clients. You can scale out the number of readers (in addition to the number of queues and communication amongst the queues) that is generally considered a very scalable architecture.
I’ve worked with RabbitMQ to do exactly these types of things; but there are many vendors with various features.
6