Update I updated my question to reflect the fact I’m working with a database.
I need to process user actions:
- The actions for each user change the user’s balance which is then persisted to a database.
- A user’s action must be processed sequentially. Otherwise, we might corrupt the balance in the database.
- Some actions are associated with 2 users in which case they can’t run in parallel with either user’s actions.
- The volume of actions per user varies considerably during the day.
- Update The processes are going to be distributed over several machines.
I am trying to find a way to distribute the actions between processes so that the above requirements are met.
Is there a known paradigm, architecture or algorithm that solves such a problem?
I’m looking for a solution that does not involve processes talking to each other except through a message queue or some other scalable mediator.
3
A simple greedy approach should fulfill your requirements. I assume you have an ordered input queue with all actions to process, and a list of actions associated with corresponding users currently under (parallel) processing (lets call these “users under processing”). Whenever the processing of one action is done, it is removed, and you check the next elements from your input queue which ones can be added as next without a collision with the users under processing. Note that you check the full queue in order, not just the top element.
You can add new actions to the processing list as long you have actions for users currently not under processing. If you need to restrict the number of processes in parallel to some number N, that’s easily done by not adding more than N actions to the currently processed acctions.
2
I would suggest you look into scheduling algorithms:
a list of scheduling algorithms by category here
and partial order relations to make sure the order of the actions of a given user is respected. You can generate a tree of dependencies using the relations which would help you in scheduling.