I can do this in R but have no idea how to do this in Python.
I have data with sbj, num_item, visit, and height. I want to create baseline_height using pandas.
Ex:
sbj | num_item | visit | height | baseline_height |
---|---|---|---|---|
1 | 1 | Baseline | 1.5 | 1.5 |
1 | 1 | Day 7 | 2 | 1.5 |
1 | 1 | Day 14 | 2.5 | 1.5 |
1 | 2 | Baseline | 1 | 1 |
1 | 2 | Day 7 | 1.5 | 1 |
1 | 2 | Day 14 | 2 | 1 |
2 | 1 | Baseline | 0.5 | 0.5 |
2 | 1 | Day 7 | 1 | 0.5 |
2 | 1 | Day 14 | 1.5 | 0.5 |
2 | 2 | Baseline | 3 | 3 |
2 | 2 | Day 7 | 3.5 | 3 |
2 | 2 | Day 14 | 4 | 3 |
I want to group by two variables, sbj and num_item. I want to create a new column called baseline_height. For each sbj and num_item combination, I want to set baseline_height to be the value of height at baseline.
I tried so many different things and none of them worked:
df['baseline_height'] = df.groupby(by = ['sbj', 'num_item']).height[['visit' == "Baseline"]]
df['baseline_height'] = 0
df = df.loc[df.groupby(by = ['sbj', 'num_item'])['baseline_height'].apply('visit' == 'Baseline') == df['height']]
df['baseline_height'] = df.groupby(by = ['sbj', 'num_item']).apply(['height'][['visit'] == 'Baseline'])
df['baseline_height'] = df.groupby(by = ['sbj', 'num_item']).apply(df['height'][df["visit"]=='Baseline'])
df_grouped = df.groupby(by = ['sbj', 'num_item'])
df['baseline_height'] = df_grouped.height[df_grouped["visit"]=='Baseline']
Tessa Senders is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.