So I’m learning how to use pandas and I’m trying to group a df based on a set of parameters in one of the columns and perform the mean on another column.
The original df is this:
> df.head()
USAF YR--MODAHRMN TEMP MAX MIN Celsius
0 28450 201705010000 31.0 NaN NaN -1.0
1 28450 201705010020 30.0 NaN NaN -1.0
2 28450 201705010050 30.0 NaN NaN -1.0
3 28450 201705010100 31.0 NaN NaN -1.0
4 28450 201705010120 30.0 NaN NaN -1.0
df.types
USAF int64
YR--MODAHRMN int64
TEMP float64
MAX float64
MIN float64
Celsius float64
The objetive is to convert the column YR–MODAHRMN into a string cut the portion that is of no interest and use the resulting column to group the data in order to calculate the mean of Celsius based on that grouping.
df["YR--MODAHRMN"]=df["YR--MODAHRMN"].astype(str)
df["YR--MODA"]=df["YR--MODAHRMN"].str[0:8]
df.head()
df.dtypes
USAF YR--MODAHRMN TEMP MAX MIN Celsius YR--MODA
0 28450 201705010000 31.0 NaN NaN -1.0 20170501
1 28450 201705010020 30.0 NaN NaN -1.0 20170501
2 28450 201705010050 30.0 NaN NaN -1.0 20170501
3 28450 201705010100 31.0 NaN NaN -1.0 20170501
4 28450 201705010120 30.0 NaN NaN -1.0 20170501
df = df[["USAF","YR--MODA","Celsius"]]
df["MeanC"] = df["Celsius"].groupby(df["YR--MODA"]).mean(numeric_only=True)
df.dtypes
df.head()
USAF int64
YR--MODA object
Celsius float64
MeanC float64
dtype: object
USAF YR--MODA Celsius MeanC
0 28450 20170501 -1.0 NaN
1 28450 20170501 -1.0 NaN
2 28450 20170501 -1.0 NaN
3 28450 20170501 -1.0 NaN
4 28450 20170501 -1.0 NaN
All of the results are NaN eventhough the inputed data is numeric I’v ran pd.isnull(df)
and it returned false on the Celsius column. What am I doing wrong?
André Eiras is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.