First of all I’m not sure if it is the correct title. Feel free to suggest a better one.
This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': [100, 100, 102, 102, 106, 106, 106, 107, 107, 107]
}
)
And this is the expected output. I want to create column b
:
a b
0 100 100
1 100 100
2 102 103
3 102 103
4 106 106.09
5 106 106.09
6 106 106.09
7 110 110
8 110 110
9 110 110
The process is as follows:
a) Groups are defined by column a
.
b) Each group must be AT LEAST 3 percent higher than the previous group. That is the output of column b
.
For example:
First group is 100. So the process starts for the next group.
100 * 1.03 = 103
The next group is 102. It is less than 103. So for b
, 103 is selected. Now the next group is 106 so:
103 * 1.03 = 106.09
106 is less than this number so 106.09 is selected. For the next group since 110 > 106.09 * 1.03, 110 is selected.
This is one of my attemps. But it feels like it is not the correct approach:
df['streak'] = df.a.ne(df.a.shift(1)).cumsum()
df['b'] = df.groupby('streak')['a'].apply(lambda x: x * 1.03)