Given the following dataframe:
Year 2024 2023 2022
Header N Result SD N Result SD N Result SD
Vendor
A 5 20 3 5 22 4 1 21 3
B 4 25 2 4 25 3 4 26 5
C 9 22 3 9 27 1 3 23 3
D 3 23 5 3 16 2 5 13 4
E 5 27 2 5 21 3 3 19 5
I would like to calculate for each year the mean value of the results column and then create a column, where the relative deviation to the mean is displayed (e.g. Results Value / mean-value * 100). The N and SD column were just included for completeness and is not needed for the calculation.
Year 2024 2023 2022
Header N Result SD Deviation N Result SD Deviation N Result SD Deviation
Vendor
A 5 20 3 85.5 5 22 4 99.1 1 21 3 ..
B 4 25 2 106 4 25 3 113 4 26 5 ..
C 9 22 3 .. 9 27 1 .. 3 23 3 ..
D 3 23 5 .. 3 16 2 .. 5 13 4 ..
E 5 27 2 .. 5 21 3 .. 3 19 5 ..
How what i be able to achieve that?
Thanks a lot in advance!
1
Use DataFrame.xs
for select Result
labels in MultiIndex
, divide by mean and append to original in concat
, last for correct position add DataFrame.sort_index
with parameter sort_remaining=False
:
df1 = df.xs('Result', axis=1, level=1, drop_level=False)
out = (pd.concat([df,
df1.div(df1.mean()).mul(100)
.rename(columns={'Result':'Deviation'})], axis=1)
.sort_index(axis=1, ascending=False, level=0, sort_remaining=False))
print (out)
2024 2023 2022
N Result SD Deviation N Result SD Deviation N Result SD
A 5 20 3 85.470085 5 22 4 99.099099 1 21 3
B 4 25 2 106.837607 4 25 3 112.612613 4 26 5
C 9 22 3 94.017094 9 27 1 121.621622 3 23 3
D 3 23 5 98.290598 3 16 2 72.072072 5 13 4
E 5 27 2 115.384615 5 21 3 94.594595 3 19 5
Deviation
A 102.941176
B 127.450980
C 112.745098
D 63.725490
E 93.137255
Another loop idea:
for x in df.columns.levels[0]:
df[(x, 'Deviation')] = df[(x, 'Result')].div(df[(x, 'Result')].mean()).mul(100)
out = df.sort_index(axis=1, ascending=False, level=0, sort_remaining=False)
print (out)
2024 2023 2022
N Result SD Deviation N Result SD Deviation N Result SD
A 5 20 3 85.470085 5 22 4 99.099099 1 21 3
B 4 25 2 106.837607 4 25 3 112.612613 4 26 5
C 9 22 3 94.017094 9 27 1 121.621622 3 23 3
D 3 23 5 98.290598 3 16 2 72.072072 5 13 4
E 5 27 2 115.384615 5 21 3 94.594595 3 19 5
Deviation
A 102.941176
B 127.450980
C 112.745098
D 63.725490
E 93.137255
1
You can use xs
to select the results, then perform your computation and concat
to the original DataFrame with reordering of the columns using sort_index
and a stable sort:
# select Result
res = df.xs('Result', level='Header', axis=1)
# compute the deviation
tmp = (pd.concat({'Deviation': res.div(res.mean()).mul(100).round(2)},
names=['Header'], axis=1)
.swaplevel(axis=1)
)
# combine to original dataset
out = (pd.concat([df, tmp], axis=1)
.sort_index(ascending=False, axis=1, level=0,
sort_remaining=False, kind='stable')
)
Or, using reshaping with stack
/unstack
and groupby.transform
:
out = (df.stack('Year', sort=False)
.assign(Deviation=lambda x: x['Result'].div(x.groupby('Year')['Result']
.transform('mean')
).mul(100).round(2))
.unstack('Year').swaplevel(axis=1)
.sort_index(ascending=False, axis=1, level=0,
sort_remaining=False, kind='stable')
)
Output:
Year 2024 2023 2022
Header N Result SD Deviation N Result SD Deviation N Result SD Deviation
Vendor
A 5 20 3 85.47 5 22 4 99.10 1 21 3 102.94
B 4 25 2 106.84 4 25 3 112.61 4 26 5 127.45
C 9 22 3 94.02 9 27 1 121.62 3 23 3 112.75
D 3 23 5 98.29 3 16 2 72.07 5 13 4 63.73
E 5 27 2 115.38 5 21 3 94.59 3 19 5 93.14