Issue Description
When using pd.Grouper
with a defined frequency for grouping dates in pandas
, there is an unexpected behavior where dates that are expected to be grouped into the previous period based on the defined starting point of the frequency are instead grouped into the following period.
Steps to Reproduce
Below is the Python code snippet that demonstrates the issue using the pandas
library:
import pandas as pd
import numpy as np
# Create a date range and dataframe
date_range = pd.date_range(start='2023-01-01', periods=10, freq='D')
data = {
'id': np.arange(1, 11),
'date': date_range,
'value': np.random.randint(1, 100, size=len(date_range))
}
df = pd.DataFrame(data)
df["dayweek"] = df["date"].dt.dayofweek
id | date | value | dayweek |
---|---|---|---|
1 | 2023-01-01 00:00:00 | 76 | 6 |
2 | 2023-01-02 00:00:00 | 47 | 0 |
# Define grouping frequency
freq = "W-SAT"
grouped_df = df.groupby(["id"] + [pd.Grouper(key="date", freq=freq)]).sum().reset_index()
id | date | value |
---|---|---|
1 | 2022-12-31 00:00:00 | 68 |
2 | 2022-12-31 00:00:00 | 82 |
Current Output
The grouped_df output incorrectly maps the date of id=1 (2023-01-01, Sunday) to 2023-01-07, Saturday of the next period, instead of the expected 2022-12-31, Saturday of the current period.
Expected Behavior
For id=1, the date 2023-01-01 (Sunday) should logically be grouped under the period starting on 2022-12-31 (previous Saturday).
Question
Is this behavior intended for pd.Grouper when any frequency is defined? If so, could you explain the rationale behind this grouping logic?
professor_tornasol is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.