I am a newbie to Data science and Machine Learning.
I am trying to understand the Normalization of the Dataframe values. Here is the scenario from the famous disaster i.e. Titanic and here is the code and result from a query:
dftitanic.groupby('Fsize')['Survived'].value_counts(normalize=False).reset_index(name='perc')
Result:
Fsize Survived perc
0 1 0 374
1 1 1 163
2 2 1 89
3 2 0 72
4 3 1 59
5 3 0 43
6 4 1 21
7 4 0 8
8 5 0 12
9 5 1 3
10 6 0 19
11 6 1 3
12 7 0 8
13 7 1 4
14 8 0 6
15 11 0 7
And if I use .value_counts(normalize=True)
, the result would be:
dftitanic.groupby('Fsize')['Survived'].value_counts(normalize=True).reset_index(name='perc')
Fsize Survived perc
0 1 0 0.696462
1 1 1 0.303538
2 2 1 0.552795
3 2 0 0.447205
4 3 1 0.578431
5 3 0 0.421569
6 4 1 0.724138
7 4 0 0.275862
8 5 0 0.800000
9 5 1 0.200000
10 6 0 0.863636
11 6 1 0.136364
12 7 0 0.666667
13 7 1 0.333333
14 8 0 1.000000
15 11 0 1.000000
And the data from describe()
:
Fsize Survived Perc
count 16.0000 16.000000 16.000000
mean 4.6875 0.437500 55.687500
std 2.7500 0.512348 95.378347
min 1.0000 0.000000 3.000000
25% 2.7500 0.000000 6.750000
50% 4.5000 0.000000 15.500000
75% 6.2500 1.000000 62.250000
max 11.0000 1.000000 374.000000
My effort:
From /a/41532180, I got the following methods:
-
normalized_df=(df-df.mean())/df.std()
-
normalized_df=(df-df.min())/(df.max()-df.min()
However, from the results of describe()
, the above two methods not matching the results of .values_counts(normalize=True)
.
A similar formula and description is present here: but did give understandable results.
Question:
How this Normalization being done? i.e. .value_counts(normalize=True)
Thanks a bunch.