Have weather data set that includes daily high temperature in Celsius in a pandas dataframe that is as simple as the date and daily high temperature (rounded to the tenth value). Here is a sample data set:
data_dict = {
'dates': ['2023-07-01', '2023-07-02', '2023-07-03', '2023-07-04', '2023-07-05', '2023-07-06', '2023-07-07', '2023-07-08', '2023-07-09', '2023-07-10', '2023-07-11', '2023-07-12', '2023-07-13', '2023-07-14', '2023-07-15', '2023-07-16', '2023-07-17', '2023-07-18', '2023-07-19', '2023-07-20', '2023-07-21', '2023-07-22', '2023-07-23', '2023-07-24', '2023-07-25', '2023-07-26', '2023-07-27', '2023-07-28', '2023-07-29', '2023-07-30', '2023-07-31', '2023-08-01', '2023-08-02'],
'daily_high_temp': [39.1, 39.8, 40, 40.3, 40.4, 40.2, 40.4, 40.6, 41, 41.1, 40.9, 41.2, 40.9, 39.9, 41.2, 42, 42.3, 41.9, 40.7, 39.8, 41.1, 41.3, 40.9, 40.7, 40, 39.8, 41.2, 40.9, 39.6, 40.9, 41.4, 41.2, 41.4]
}
df = pd.DataFrame(data=data_dict)
Want to create another dataframe column ‘extreme_highs’ that logs extreme high temps with several conditions. Those conditions:
- For dates/temps not logged set value = 0
- Only temps greater than 40 degrees Celsius are eligible for logging.
- Starting at earliest date, identify temp greater than 40 degrees.
- Considering that date/temp and the proceeding 3 days immediately following that date, log the date/temp with the max temp.
- If two or more date/temps in the four-day window share the max temp, only log the date/temp in which the max temp first occurred (earliest occurrence)
- Once a date/temp has been logged, the next six days are ineligible for logging. Only one date/temp in a seven day period is to be logged.
Based on the data above and conditions listed, the resulting data should be (hopefully) produced:
|date|daily_high_temp|extreme_highs|
|:—–:|:——————:|:—————–:|
|2023-07-01|39.1|0|
|2023-07-02|39.8|0|
|2023-07-03|40|0|
|2023-07-04|40.3|0|
|2023-07-05|40.4|40.4|
|2023-07-06|40.2|0|
|2023-07-07|40.4|0|
|2023-07-08|40.6|0|
|2023-07-09|41|0|
|2023-07-10|41.1|0|
|2023-07-11|40.9|0|
|2023-07-12|41.2|41.2|
|2023-07-13|40.9|0|
|2023-07-14|39.9|0|
|2023-07-15|41.2|0|
|2023-07-16|42|0|
|2023-07-17|42.3|0|
|2023-07-18|41.9|0|
|2023-07-19|40.7|0|
|2023-07-20|39.8|0|
|2023-07-21|41.1|0|
|2023-07-22|41.3|41.3|
|2023-07-23|40.9|0|
|2023-07-24|40.7|0|
|2023-07-25|40|0|
|2023-07-26|39.8|0|
|2023-07-27|41.2|0|
|2023-07-28|40.9|0|
|2023-07-29|39.6|0|
|2023-07-30|40.9|0|
|2023-07-31|41.4|41.4|
|2023-08-01|41.2|0|
|2023-08-02|41.4|0|
I have developed a solution to this in excel, but it is convoluted and complicated. Hoping that a simple “pythonic” solution can be shared because I am stumped! Thank you for your help!!!