I’m attempting to accelerate a piece of Python code using Numba. Within this code, I’m also employing mpi4py for parallelization. However, I’ve encountered an error. I’ve attempted to provide a minimal error-reproducing example below:
import numpy as np
from numba import njit, prange
from mpi4py import MPI
#####MPI setting#####
comm=MPI.COMM_WORLD
rank=comm.Get_rank()
size=comm.Get_size()
N_theta_to_scan=1000
n_per_proc=int(N_theta_to_scan/size)
n_more=int(N_theta_to_scan%size)
if rank<n_more:
start=rank*(n_per_proc+1)
number_to_cal=n_per_proc+1
else:
start=n_more*(n_per_proc+1)+(rank-n_more)*n_per_proc
number_to_cal=n_per_proc
#####function#####
@njit(parallel=True)
def func1(na,nb,nc):
to_sum=np.zeros(na*nb*nc)
for a in prange(0,na):
for b in prange(0,nb):
for c in prange(0,nc):
to_sum[a*nb*nc+b*nc+c]=a*b*c
out=np.sum(to_sum)
return out
@njit(parallel=True)
def func2(start,number_to_cal):
to_sum=np.zeros(number_to_cal)
for i in prange(start,start+number_to_cal):
to_sum[i-start]=func1(i,i*i,i*i*i)
out2=np.sum(to_sum)
return out2
#####main section#####
to_be_gather=np.array([func2(start,number_to_cal)])
gatheres=np.zeros(0)
comm.Reduce(to_be_gather,gatheres,op=MPI.SUM)
The error occurs for this segment of the code:
MemoryError: Allocation failed (probably too large).
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/share/workspace/wuliang/hanlin/test/test.py", line 38, in <module>
to_be_gather=func2(start,number_to_cal)
^^^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: CPUDispatcher(<function func2 at 0x2aed19d7b420>) returned a result with an exception set
Next, I attempted to remove func1()
and modified func2()
by replacing to_sum[i-start]=func1(i,i*i,i*i*i)
with to_sum[i-start]=i
, which resulted in successful execution. Additionally, I tried running the code with only rank 0 executing the main function, i.e., running the code under the condition if rank==0:
, which also ran successfully. I’m wondering where the error lies, and what modifications should I make to achieve the functionality of the current code?