My question is nearly identical to
Finding frequency of each value in all categorical columns across a dataframe
but I need the probabilities, instead of the frequencies. We can use the same example dataframe:
df = pd.DataFrame(
{'sub_code' : ['CSE01', 'CSE01', 'CSE01',
'CSE02', 'CSE03', 'CSE04',
'CSE05', 'CSE06'],
'stud_level' : [101, 101, 101, 101,
101, 101, 101, 101],
'grade' : ['STA','STA','PSA','STA','STA','SSA','PSA','QSA']})
I tried adapting this answer
/a/70811258
in the following way:
out = (df.select_dtypes(object)
.melt(var_name="Variable", value_name="Class")
.value_counts(dropna=False, normalize=True)
.reset_index(name="Probability")
.sort_values(by=['Variable','Class'], ascending=[True,True])
.reset_index(drop=True))
However, the code doesn’t work, because the sum of the class probabilities for each variable is not 1. What am I doing wrong?