I am in the process of processing a list of arrays, and this is a workflow that could be parallelised as multiple points: One thread for each dataset in the list, and multiple threads each to handle the different slices in the array.
I’ve read that worker threads are generally not allowed to spawn their own processes, and would appreciate advice on whether it’s possible to overcome that limitation, and have parallel processes trigger their own parallel processes.
I’ve written a short example code for reference.
# Can a pool run their own pools?
import numpy as np
from multiprocessing import Pool
def do_something(
num: int,
data: list[int],
num_procs: int = 1,
):
args_list = []
for n in range(len(data)):
args_list.append([data[n], num])
with Pool(processes=num_procs) as pool:
results = pool.starmap(do_another_thing, args_list)
return results
def do_another_thing(
value1: int,
value2: int,
):
return value1 * value2
# Main
if __name__ == "__main__":
# Example dataset of a list of arrays
data = [np.random.randint(0, 255, (20)) for n in range(20)]
args_list = []
for i in range(len(data)):
args_list.append([i, data[i]])
with Pool(processes=4) as pool:
results = pool.starmap(do_something, args_list)
for result in results:
print(results)
Advice would be much appreciated. Thanks!