I am looking for an advise on selecting or building a job placement algorithm.
In my company we have a simple computing platform built on top of kubernetes. Multiple clients, send compute jobs to a single Kafka topic, multiple workers continuously pull for new tasks from the queue, execute jobs and go back to pick up next tasks. Any worker can execute any job, the system has practically a single queue.
I need to modify this system to become data/cache aware. Imagine that each job requires some data [D1, D2, … Dn], when this job lands on a worker [Wi] it first needs to retrieve this data and cache it locally before it can start execution. When a new job comes in with a requirement for some data [Dx], I want it to be assigned to a “worker pool” where workers already cached that data [Dx]
I am free to modify the architecture to implement the new functionality, for example we can replace Kafka or use multiple topics or introduce some look up tables etc.
Requirements:
- The number of datasets is not known in advance, new datasets come at runtime.
- Consider that worker can store/cache infinite number of datasets.
- It has to be a push based mechanism where worker selects a queue/topic to get the next task from.
Perhaps an algorithm like this already exists. I would appreciate any direction.
Thank you.
AG14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.