My purpose is to group the datetime indexed dataframe into months
and write to a parquet file but chunked / row_grouped by month.
The reason for this is that I have many dataframes, and each dataframe will be a parquet file. However, when I read those dataframes, I don’t need to always read the whole, yet just read several months of each.
For example, one typical dataframe I have is like
Level Score
2023-10-21 00:01:00 A 1
2023-10-22 00:01:00 B 2
...
2023-11-01 00:01:00 C 3
...
2023-11-30 00:01:00 D 4
2023-12-01 00:02:00 A 5
...
2023-12-22 00:02:00 B 6
I would like have one parquet file for this dataframe, yet, inside somehow group / chunk the parquet by month, so I can read, e.g., only 2023-10 data from this file without touching other months.
I am reading the pyarrow parquet. I see parquet has chunksize, row groups etc.
However, I got confused by them.
How can I group the chunks / rows by month?