After using a Pool
from Python’s multiprocessing
to parallelize some computationally intensive work, I wish to retrieve statistics that were kept local to each spawned process. Specifically, I have no real-time interest in these statistics, so I do not want to bear the overhead that would be involved with using a synchronized data structure in which statistics could be kept.
I’ve found some suggestions where the idea would be to use a second pool.map()
with a different function which returns the state local to its worker. I believe this to be incorrect since there is no guarantee that this second invocation would lead to exactly one job being distributed to every worker process in the pool. Is there a mechanism that would achieve this?
Skeleton snippet where it’s unclear what can be done after the imap_unordered() completes.
import multiprocessing as mp
import random
local_stats = {"success": 0, "fails": 0}
def do_work(_):
if random.choice([True, False]):
local_stats["success"] += 1
else:
local_stats["fails"] += 1
if __name__ == "__main__":
with mp.Manager() as manager:
with mp.Pool(processes=2) as pool:
results = list(pool.imap_unordered(do_work, range(1000)))
# after .imap_unordered() completes, aggregate "local_stats" from each process in the pool
# by either retrieving its local_stats, or having them push those stats to the main process
# ???