I have a problem with resample in pandas.
There is a dataframe with the data specified below.
Datetime as an index
I need to generate a resample of a 3-hour interval, even if there is only 2 hours of data at the end of the day
2024-07-17 09:00:00 1025.6 1025.6 1025.6 1025.6 5343
2024-07-17 10:00:00 1025.4 1026.2 1013.0 1019.4 417547
2024-07-17 11:00:00 1019.6 1038.0 1019.4 1030.6 648291
2024-07-17 12:00:00 1030.2 1039.0 1030.2 1036.2 245209
2024-07-17 13:00:00 1036.0 1038.0 1029.0 1031.2 168755
2024-07-17 14:00:00 1031.2 1034.6 1025.6 1029.4 146726
2024-07-17 15:00:00 1029.6 1036.8 1026.2 1033.8 144680
2024-07-17 16:00:00 1033.8 1047.6 1032.6 1046.6 421860
2024-07-17 17:00:00 1046.2 1051.6 1043.8 1047.4 480781
2024-07-17 18:00:00 1047.6 1047.8 1030.8 1035.6 391968
2024-07-17 19:00:00 1037.2 1044.4 1034.0 1040.0 139914
2024-07-17 20:00:00 1039.8 1040.0 1032.0 1035.2 79836
2024-07-17 21:00:00 1035.0 1040.4 1034.8 1040.0 41185
2024-07-17 22:00:00 1039.8 1043.0 1035.2 1037.0 47903
2024-07-17 23:00:00 1037.0 1037.6 1034.0 1034.6 37957
2024-07-18 09:00:00 1036.2 1036.2 1036.2 1036.2 1403
2024-07-18 10:00:00 1036.2 1043.0 1026.4 1034.0 430315
2024-07-18 11:00:00 1033.8 1034.6 1022.8 1027.2 461764
2024-07-18 12:00:00 1027.2 1038.0 1027.0 1037.0 205564
2024-07-18 13:00:00 1037.0 1042.8 1033.2 1042.2 290967
2024-07-18 14:00:00 1042.2 1047.0 1037.4 1044.6 265148
2024-07-18 15:00:00 1044.2 1047.0 1037.4 1042.4 226078
2024-07-18 16:00:00 1042.2 1056.4 1041.4 1053.2 425763
2024-07-18 17:00:00 1053.6 1060.0 1048.2 1056.8 347094
2024-07-18 18:00:00 1057.2 1065.8 1057.2 1065.6 376461
2024-07-18 19:00:00 1066.0 1070.0 1062.4 1067.4 237408
2024-07-18 20:00:00 1068.0 1069.0 1062.8 1068.4 63951
2024-07-18 21:00:00 1068.4 1070.0 1065.0 1065.8 60839
2024-07-18 22:00:00 1065.8 1070.0 1065.2 1069.4 68737
2024-07-18 23:00:00 1069.0 1069.6 1068.8 1069.0 2421
I need to do a resample starting at 7 a.m.:
df = df.resample('3H', offset='07h00min').agg(
{'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum'})
At the output, I get the following dataframe:
2024-07-17 07:00:00 1025.6 1025.6 1025.6 1025.6 5343
2024-07-17 10:00:00 3075.2 3103.2 3062.6 3086.2 1311047
2024-07-17 13:00:00 3096.8 3109.4 3080.8 3094.4 460161
2024-07-17 16:00:00 3127.6 3147.0 3107.2 3129.6 1294609
2024-07-17 19:00:00 3112.0 3124.8 3100.8 3115.2 260935
2024-07-18 07:00:00 1036.2 1036.2 1036.2 1036.2 1403
2024-07-18 10:00:00 3097.2 3115.6 3076.2 3098.2 1097643
2024-07-18 13:00:00 3123.4 3136.8 3108.0 3129.2 782193
2024-07-18 16:00:00 3153.0 3182.2 3146.8 3175.6 1149318
2024-07-18 19:00:00 3202.4 3209.0 3190.2 3201.6 362198
As you can see, the data for 22 and 23 pm do not fall into the new dataframe, since there are no full 3 hours. How do I make sure that in any case, data is generated for 22 pm, which will consist only of a 2-hour period?
Ruslan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
4
You can use the resample function with a ‘left’ fill method to include the last value for each day, even if it’s not a full 3-hour period. Here is how you can do it:
df = df.resample('3H', offset='07h00min').agg(
{'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum'}).fillna(method='ffill')
This will fill the missing values with the last available values, which in your case would be the values for 22:00 and 23:00 for the last day.