I have been tasked with parallelizing a piece of code for a huge project I am taking a part in. The code.py
I have been tasked with the parallelization goes something like this:
import numpy as np #importing modules sections
from tqdm import tqdm
import time
import os
from Auxiliar_py.functions import func1 #import func1 from functions.py within Auxiliar_py folder
... #many lines of code omitted for conciseness
constant1=10 #constant section
constant2=20
loops=fits.open('loops.fits')
loops_list=len(loops)
...
def mainFunction(arg1,arg2,arg3,...): #the main function
var01=constant1+1
var02=func1(var01)
...
attempt=0
max_attempts=5
for i in tqdm(range(len(loops))):
while attempt < max_attempts:
try:
for p in range(5): # <------ the loops I want to parallelize
var11=func1(var01) # <------
var12=func1(var11) # <------
var13=func2(var01,var12) # <------
... # <------ this entire sections takes 400+ lines of code
break
except Exception as e:
print(f"An error occurred: {e}")
attempt += 1
print(f"Attempt {attempt} out of {max_attempts}")
var21=var11+20
var22=func3(var13)
....
To my horror, there are nested loops within the desired function which takes a HUGE toll on the performance. While I am able to, using Ray, parallelize other sections of the code including some sections within for p in range(5)
, the entire section of the loop bugs me the most. Is there a way to parallelize that for-loop using Ray without rewriting 400+ lines of code within it?
The solutions I could think of but too scared to attempt are:
-
Turn all of the functions and processes within the loop into 1 function, decorate it with
@ray.remote
, and use it here at the loops to enable parellization. This will, however, require me to manually list the inputs and outputs of the 400+ codes so I can define it as a function. -
Rewrite everything.
I’m currently new to parallelization so any answers and comments are hugely appreciated. Thank you all in advance.
Note: While other packages/modules for parallelization are avalible, Ray is the most suitable for this project as a whole. It will be greatly appreciated if I can stick to Ray and not change to other packages.