I have a pandas DataFrame where one column consists of datetime.date
objects. I was hoping to use groupby()
to count all entries in a given year and month without adding unnecessary columns to the dataframe.
The way I know how to do this is to add new columns to the dataframe, e.g. using year_column = df['Date'].apply(lambda x: x.year)
and month_column = df['Date'].apply(lambda x: x.month)
, and thereafter using groupby()
, but it feels like there should be a quicker way, without changing the dataframe, similarly to SQL GROUPBY
.
Does pandas allow for a slick way to group by some function applied to a column?
First of all, you don’t need apply
, directly get the year/month with df['Date'].dt.year
/df['Date'].dt.month
(see dt.year
/dt.month
).
Then you can group with an external Series, it doesn’t necessarily need to be an existing column of your DataFrame:
df.groupby(df['Date'].dt.year)
Note that you can also use pd.Grouper
with the YS
/YE
frequencies, however not with months (like Jan-Dec), only with Year-Month periods.