My input is a dataframe with one column :
import pandas as pd
df = pd.DataFrame({'ID': ['A1-B1', 'A1-B2', 'A1-B3', 'A1-B8', 'A2-B10', 'A2-B16', 'A2-B9', 'A3-B13', 'A3-B14']})
I’m trying to create a second column that will identify the groups based on two things :
- the Axx
- the sign change of Bxx
Group1 : In A1: the numbers of Bxx are increasing = B1 to B8
Group2 : In A2, the numbers of Bxx are first increasing = B10 to B16
Group3 : In A2, the numbers of Bxx then decreased to B9
Group4 : In A3, the number of Bxx are increasing = B13 to B14
My expected output is this :
ID GROUP
0 A1-B1 1
1 A1-B2 1
2 A1-B3 1
3 A1-B8 1
4 A2-B10 2
5 A2-B16 2
6 A2-B9 3
7 A3-B13 4
8 A3-B14 4`
I tried using an idea of findall and ngroup but it didn’t goes well :
new_df = df['ID'].str.findall(r'd+').apply(lambda x: pd.Series(x))
new_df.columns = ['Axx', 'Bxx']
new_df['GROUP'] = new_df.groupby(['Axx', 'Bxx']).ngroup()
print(new_df)
Axx Bxx GROUP
0 1 1 0
1 1 2 1
2 1 3 2
3 1 8 3
4 2 10 4
5 2 16 5
6 2 9 6
7 3 13 7
8 3 14 8
Do you guys have a proposition ?