My function is such that it takes the minimum over the range of dataframe which keeps increasing in length and maximum over the dataframe which reduces in length with each iteration.
The dataframe over which this calculations are to be performed are a subset of another bigger dataframe itself and this results in a nested loop, increasing the time complexity dramatically.
def drawdown(result_df, dict_dfs):
last_year_df = pd.DataFrame(data=np.nan, index=result_df.index, columns=result_df.columns)
for idx in range(len(result_df)):
for stock in result_df.columns:
past_date_idx = max(0, idx - 250)
past_date = result_df.index[past_date_idx]
current_date = result_df.index[idx]
last_year = dict_dfs['close'].loc[past_date:current_date, stock]
drawdowns = []
for i in range(len(last_year)):
rolling_min = last_year.iloc[:i + 1].min()
rolling_max = last_year.iloc[i:].max()
if rolling_min != 0:
drawdown = (rolling_max - rolling_min) / rolling_min
drawdowns.append(drawdown)
last_year_df.iloc[idx][stock] = np.median(drawdowns)
return last_year_df
With this code, is there any function which can help me improve the speed? If yes, then what changes should I make such that the code logic is the same but instead of using loops, I use the vectorised functions!