I have a numpy array that has times in the first column and the other columns contain signal values. The objective is to use a peak finding algorithm for all the signal value columns and get the peaks and their corresponding times. My code looks like this:
import numpy as np
from peakutils import indexes
arr = stress_data.to_numpy()
times_all = arr[:,0]
peaks_all = []
width = 125
for i in range(1,np.shape(arr)[1]):
x = arr[:,i]
xf = x - np.mean(x)
threshold = 0.1*np.average(xf) / np.max(xf)
# Find x-coordinates of peaks in signal
peaks = indexes(xf, thres = threshold, min_dist = width)
sg = [times_all[peaks], xf[peaks]]
peaks_all.append(sg)
This code works as expected but the problem is that it takes a long time to run.
I tried optimizing this with the np.apply_along_axis
and defining a processing function for it but doing it that way took even longer. Here is the code I used for that:
def process_column(x):
xf = x - np.mean(x)
threshold = 0.1 * np.average(xf) / np.min(xf)
valleys = indexes(xf, thres=threshold, min_dist=width)
return [times_all[valleys], xf[valleys]]
valleys_all = np.apply_along_axis(process_column, axis=0, arr=data_all.T)
About the numpy array: Besides the time column, there are 10 columns to be processed and there are 10.8 million rows. Any help wiyh optimizing this would be appreciated