I have a list of objects that all require the exact same filtering, basically a set of conditionals in a function which outputs if the object is “good” or “bad”. I want to keep all of my “good” objects in the list. Lets say I have 1000 objects in this list and the function I want to filter on in written. What languages/functionality would maximize how parallel this operation could be? I have played with python’s multiprocessing, which does improve performance, but still left me unsatisfied. I suppose this is an “embarrassingly parallel” problem as no state needs to be shared.
4
I’m not entirely sure as to what to answer to this question, but two immediate possibilities come to my mind:
Scala
As you are already describing the problem in a functional way and explicitly state that you have no shared state, the parallel collection API of Scala seems like a perfect match for the problem. It may not provide top performance though (see below for that), but will certainly require only a tiny amount of effort to setup. Something like the following is all you need really:
myList.par.filter(goodBadCondition)
The .par
turns any collection into a parallelized one, which executes operations like filter
in parallel using the JVM’s fork/join-support. Note that the parallel collection operations are stable, i.e. the order of the original list elements remains the same in the result (for those elements that passed the condition).
GPU based
If you really want to squeeze out a maximum of performance, then letting the GPU perform the filtering seems the perfect way to go. In this case, most of your speed loss comes from having to send data to/from the GPU. While the operation you want to perform is a perfect match for something like Cuda, I am not entirely sure that it is worthwhile. For example, if your condition is very simple and extremely fast to check, then the overhead of having to transfer the whole list to the GPU may not pay off against a quad core CPU which can process the whole list in its memory.
On the other hand, if your filter condition is costly to check, then you can check way more elements at the same time as compared to a CPU. Depending on the card you use, you can get from several dozens up to hundreds of GPU cores to perform the work.
However, the implementational effort of this approach is far higher than the one above. You also need to have a lot more knowledge about GPU programming and you will need a lot of code for such a supposedly trivial task (newer frameworks to simplify this are starting to appear though).
2