I have created a pandas dataframe as follows:
import pandas as pd
import numpy as np
ds = { 'trend' : [1,1,1,1,2,2,3,3,3,3,3,3,4,4,4,4,4], 'price' : [23,43,56,21,43,55,54,32,9,12,11,12,23,3,2,1,1]}
df = pd.DataFrame(data=ds)
The dataframe looks as follows:
display(df)
trend price
0 1 23
1 1 43
2 1 56
3 1 21
4 2 43
5 2 55
6 3 54
7 3 32
8 3 9
9 3 12
10 3 11
11 3 12
12 4 23
13 4 3
14 4 2
15 4 1
16 4 1
I have saved the dataframe to a .csv file called df.csv:
df.to_csv("df.csv", index = False)
I need to create a new field called ema2
which:
-
iterates through each and every record of the dataframe
-
calculates the Exponential Moving Average (EMA) by considering the price observed at each iteration and the prices (EMA length is 2 in this example) observed in the previous trends.
For example: -
I iterate at record 0 and the EMA is NaN (missing).
-
I iterate at record 1 and the EMA is still NaN (missing)
-
I Iterate at record 12 and the EMA is 24.20 (it considers price at record 3, price at record 5 and price at record 12
-
I Iterate at record 13 and the EMA is 13.53 (it considers price at record 3, price at record 5 and price at record 13
-
I Iterate at record 15 and the EMA is 12.46 (it considers price at record 3, price at record 5 and price at record 15
and so on …..
I have written the following code:
time_window = 2
ema= []
for i in range(len(df)):
ds = pd.read_csv("df.csv", nrows=i+1)
d = ds.groupby(['trend'], as_index=False).agg(
{'price':'last'})
d['ema2'] = d['price'].ewm(com=time_window - 1, min_periods=time_window).mean()
ema.append(d['ema2'].iloc[-1])
df['ema2'] = ema
Which produces the correct dataframe:
print(df)
trend price ema2
0 1 23 NaN
1 1 43 NaN
2 1 56 NaN
3 1 21 NaN
4 2 43 35.666667
5 2 55 43.666667
6 3 54 49.571429
7 3 32 37.000000
8 3 9 23.857143
9 3 12 25.571429
10 3 11 25.000000
11 3 12 25.571429
12 4 23 24.200000
13 4 3 13.533333
14 4 2 13.000000
15 4 1 12.466667
16 4 1 12.466667
The problem is that when the dataframe has millions of records: it takes a very long time to run.
Does anyone know how to get the same results in a quick, efficient way, please?