I am trying to calculate percentile(0.05) to percentile(0.95) with 0.05 as step. I did some test and one of the implementation gave me weird results. I couldn’t figure out why. I will just use percentile(0.05) and percentile(0.85) in my sample code to illustrate. I ran the code in Jupyter Notebook.
If I list the percentile functions explicitly, it seems they work all right.
quantile_funcs = [lambda x : x.quantile(0.05), lambda x : x.quantile(0.85)]
data = {'group':['A', 'B', 'A', 'A', 'B', 'B', 'B', 'A', 'A'], 'col1':[1, 3, 3, 5, 4, 6, 7, 7, 8]}
df = pd.DataFrame(data)
df['col2']=df['col1']+1
df_sum = df.groupby(['group']).agg([*quantile_funcs, "count", 'mean'])
df_sum.columns = ['c1_P5', 'c1_p85', 'c1_ct', 'c1_mean', 'c2_P5', 'c2_p85', 'c2_ct', 'c2_mean']
print(df_sum)
The results are right and the quantile_funcs
point to two lambda functions under __main__
. See screenshot below.
However, if I didn’t list the percentile function explicitly but generate the list of functions through a loop. It gave me weird results.
percentiles = [0.05, 0.85]
quantile_funcs = [lambda x: x.quantile(p) for p in percentiles]
data = {'group':['A', 'B', 'A', 'A', 'B', 'B', 'B', 'A', 'A'], 'col1':[1, 3, 3, 5, 4, 6, 7, 7, 8]}
df = pd.DataFrame(data)
df['col2']=df['col1']+1
df_sum = df.groupby(['group']).agg([*quantile_funcs, "count", 'mean'])
df_sum.columns = ['c1_P5', 'c1_p85', 'c1_ct', 'c1_mean', 'c2_P5', 'c2_p85', 'c2_ct', 'c2_mean']
print(df_sum)
The results are weird. it seems only the last quantile in the list is used. The quantile_funcs
point to two lambda functions under __main__.<listcomp>
. See screenshot below.
I am not sure how to make changes on the 2nd implementation to make it work. Appreciate your helps. Thanks.