I’m currently facing a performance issue with my Python code, specifically with the execution speed. I’ve identified a bottleneck in my code, and I’m seeking advice on how to optimize it for better performance.
Here’s a simplified version of the code:
import numpy as np
def merge_arrays(arr: list, new: list) -> list:
if len(arr) != len(new):
raise ValueError(f'Length of <inds> is {len(arr)} but length of <new> is {len(new)}')
else:
return transform_array(zip(arr, new), [])
def transform_array(arr: list, r: list):
for x in arr:
if isinstance(x, (list, tuple, np.ndarray)):
transform_array(x, r)
else:
r.append(x)
return r
COUNT = 100000
pips = [(np.arange(5), np.arange(5)) for _ in range(COUNT)]
b = [np.arange(50, 65) for _ in range(COUNT)]
ang = [np.arange(50, 55) for _ in range(COUNT)]
dist = [np.arange(50, 55) for _ in range(COUNT)]
result = [merge_arrays(y, [np.append(b[i][1:], [dist[i][p], ang[i][p]]) for p, i in enumerate(x)]) for x, y in pips]
The issue arises when dealing with large datasets, particularly when the input lists contain a significant number of elements. This leads to slow execution times, which is undesirable for my application.
I’ve tried to optimize the code by using list comprehensions and avoiding unnecessary condition checks, but the performance is still not satisfactory.
I’m looking for suggestions on how to refactor or optimize this code to improve its performance, especially in the transform_array function where the bottleneck seems to be located.
Any insights or alternative approaches would be greatly appreciated. Thank you!