I am trying to implement parallel processing in python.
CDSW environment has options 1vCPU, 2vCPU,…,4vCPU.
Irrespective of the option selected, it has 16 cores(nproc –all)
I was able to run 4 parallel jobs using multiprocessing package.
Parallel(n_jobs=4, prefer=”threads”)(delayed(execute_function){….}
more than 4 parallel jobs, is causing memory overload error
I was hoping for a way to make use of all 4vCPUs ., with 4 cores in each => 16 parallel jobs.
Additional Info:
python 3.6
execute_function => pyspark, pandas processing code.