I can’t work out how .mean()
knows to calculate on the OVERALL_LI
column in the final line in this code block.
import pandas as pd
education_districtwise = pd.read_csv('education_districtwise.csv')
mean_overall_li = education_districtwise['OVERALL_LI'].mean()
mean_overall_li
std_overall_li = education_districtwise['OVERALL_LI'].std()
lower_limit = mean_overall_li - 1 * std_overall_li
upper_limit = mean_overall_li + 1 * std_overall_li
#this is the code that confuses me
((education_districtwise['OVERALL_LI'] >= lower_limit) & (education_districtwise['OVERALL_LI'] <= upper_limit)).mean()
I understand that the column is being used to filter to just the rows within 1 stdev of the mean, but not how the same column is having its mean calculated.
I tried breaking it apart using things like
(education_districtwise['OVERALL_LI'] >= lower_limit)
I assumed this would give me a list of numbers from that column, but instead it outputs
0 True
1 True
2 True
3 False
The code works correctly, I just don’t understand why.
education_districtwise.csv:
DISTNAME,STATNAME,BLOCKS,VILLAGES,CLUSTERS,TOTPOPULAT,OVERALL_LI
DISTRICT32,STATE1,13,391,104,875564.0,66.92
DISTRICT649,STATE1,18,678,144,1015503.0,66.93
DISTRICT229,STATE1,8,94,65,1269751.0,71.21
DISTRICT259,STATE1,13,523,104,735753.0,57.98
DISTRICT486,STATE1,8,359,64,570060.0,65.0
DISTRICT323,STATE1,12,523,96,1070144.0,64.32
DISTRICT114,STATE1,6,110,49,147104.0,80.48
DISTRICT438,STATE1,7,134,54,143388.0,74.49
DISTRICT610,STATE1,10,388,80,409576.0,65.97
Breaking the code down into its elements to try and understand it better.