I’m looking for an algorithm to handle the following problem, which I’m (for now) calling the “bad apple” algorithm.
The problem
- I’ve got N processes running in M sandboxes, where N >> M.
- It’s impractical to give each process its own sandbox.
- At least one of those processes is badly-behaved, and is bringing down the entire sandbox, thus killing all of the other processes in the same sandbox.
If it was a single badly-behaved process, then I could use a simple bisection to put half of the processes in one sandbox, and half in another sandbox, until I found the miscreant.
The question
If more than one process is badly-behaved — including the possibility that they’re all badly-behaved — does this naive algorithm “work”? Is it guaranteed to work within some sensible bounds?
Simplifications
For the sake of argument, let’s assume that a bad process brings down its sandbox instantaneously, and a good process never does.
6
If you were finding one bad apple, then you could apply the algorithm:
- Divide N into M groups
- Perform the test on each group.
- If group size greater than 1, return to step 1, replacing N with number of items in bad group.
Likewise, if you were finding the one “good” apple, then the same algorithm applies, but rather finding the good group.
This algorithm has a O(log_M (N)) performance rate but it depends on the fact that there is only one bad apple.
If you don’t know how many good/bad apples there are, then you could use the following algorithm:
For each M processes in N
Test M processes
This is the worst case scenario, but it runs in O(N/M)
time (or O(N)
depending on if you consider a single pass as a single test or as a collection of tests performed in parallel). All things considered, this is not a bad approach by any means.
There may be algorithms which perform better than this, but it requires that you know how many bad apples/good apples there are in the batch. Not knowing this factor, while I can’t prove it, I’d be willing to bet that you cannot do better than the latter algorithm listed above.
Hope that helps!
Edit: From a practical perspective, I understand that some of these operations are not easily performed. That’s true, however, the unfortunate reality is that you cannot determine the bad apple strictly from which processes were running on the sandbox that was running at that moment without being to at least activate or disactivate processes. If the question pertains to the algorithm, I think I have answered that. If the question pertains to how to deal with a situation like this, then perhaps the question would be better suited for superuser SE.
5