I have a dataframe with two columns. (It can have more data columns)
I want to sort by values according to the volatility of each row.
For example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'item1': {'trial_1': 1.0,
'trial_2': 7.0,
'trial_3': 16053.2,
'trial_4': 16053.2,
'trial_5': 224685.5},
'item2': {'trial_1': 0.0,
'trial_2': 0.0,
'trial_3': 19340.1,
'trial_4': 19340.1,
'trial_5': 269635.8}})
The dataframe should be like this:
item1 item2
trial_1 1.0 0.0
trial_2 7.0 0.0
trial_3 16053.2 19340.1
trial_4 16053.2 19340.1
trial_5 224685.5 269635.8
To get the volatility percentage of two colums, we can use the 100 * abs(gap of item1 and item2)/item1
to calcuate the percentage of each row.
To make it general for 3 or more columns, I have an idea, which is to use mean value of each row.
That would be
data_cols = ['item1', 'item2']
vf['mean'] = vf.T.describe().T['mean']
vf['volatility'] = np.sum(
np.abs(
[(vf['item2'] - vf['mean']) / vf['mean'] for c in data_cols]
), axis=0
)
vf.sort_values('volatility')
I wonder if it is a known algorithm in math or statics science? I googled but I don’t find similar way to calculate the volatilty.
1
You can vectorize your operations directly with mean
, sub
, div
and the correct axis
:
cols = list(vf) # or provide an explicit list
# compute averagr
avg = vf[cols].mean(axis=1)
vf['mean'] = avg
# compute volatility on all columns directly
vf['volatility'] = vf[cols].sub(avg, axis=0).div(avg, axis=0).abs().sum(axis=1)
Output:
item1 item2 mean volatility
trial_1 1.0 0.0 0.50 2.000000
trial_2 7.0 0.0 3.50 2.000000
trial_3 16053.2 19340.1 17696.65 0.185736
trial_4 16053.2 19340.1 17696.65 0.185736
trial_5 224685.5 269635.8 247160.65 0.181867