In this dataframe I want to create a column ‘desired_output’ that takes the first value of ‘lumpsum’ from the ‘index’ of each ‘ID’.
data = [
[1, 12334, 1, 12334],
[1, 12334, 1, 12334],
[1, 12334, 1, 12334],
[1, 12334, 1, 12334],
[1, 34567, 1, 12334],
[1, 34567, 1, 12334],
[2, 45788, 1, 45788],
[2, 45788, 2, 45788],
[2, 23467, 2, 45788],
[2, 5678, 3, 5678],
[2, 4567, 3, 5678],
[3, 56832, 1, 56832],
[3, 43456, 1, 56832],
[3, 2378, 2, 2378],
[4, 6754, 1, 6754],
[4, 3456, 2, 3456]
]
columns = ['ID', 'lumpsum', 'index', 'desired_output']
df = pd.DataFrame(data, columns=columns)
print(df)
I used this code tried to create the ‘desired_output’ column, and called this new column ‘test’.
df['test']=df.groupby('ID', 'index')['lumpsum'].transform('first')
The output completely ignored my grouping using ‘index’ and only returned the first ‘lumpsum’ value of each ‘ID’. how should I rectify this?