I have the following data
data = {
'Subject': ['3','3','3','3','3','3','3','3','3','10','10','10','10','10','10','10','10','10'],
'Day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'Value': [19.0959, 19.2321, 19.3088, 19.2589, 19.3085, 19.0455, 19.3491, NaN, 19.1823, 25.7506, 25.8287, NaN, 26.2913, NaN, 26.1501, 25.9447, 25.9493, 25.9629]
}
which becomes this dataframe after pd.DataFrame(data):
Subject Day Value
0 3 1 19.0959
1 3 2 19.2321
2 3 3 19.3088
3 3 4 19.2589
4 3 5 19.3085
5 3 6 19.0455
6 3 7 19.3491
7 3 8 NaN
8 3 9 19.1823
9 10 1 25.7506
10 10 2 25.8287
11 10 3 NaN
12 10 4 26.2913
13 10 5 NaN
14 10 6 26.1501
15 10 7 25.9447
16 10 8 25.9493
17 10 9 25.9629
I have attempted to interpolate the missing data in the ‘Value’ column but this dataframe being 3 columns seems to be causing issues to where the data is not being interpolated properly within the groups whenever a MUCH LARGER dataframe with many more subjects are involved.
For example:
df['Value'] = df.groupby('Subject')['Value'].transform(lambda group: group.interpolate())
works for small datasets like the one shown, but whenever I apply the same code to replace NaN’s with interpolation when there are 1,000 differnt subjects each with 9 days of data values, there are instances for subject 10 where day 5 still remains NaN after interpolation. Any advice on this? Thanks!
F_pandas_groupby is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.