I have the following dataframe:
state_territory_code week_nbr
CA WEEK 49
CA WEEK 49
FL WEEK 49
CA WEEK 50
TN WEEK 50
HI WEEK 50
GA WEEK 51
FL WEEK 51
I am trying to essentially perform the following:
- If the current week number during iteration is different than the
next (if it changes) – then take the date, add 7 days, and create a
new column.
The output should be:
state_territory_code week_nbr new_cal_date
CA WEEK 49 01/01/2024
CA WEEK 49 01/01/2024
FL WEEK 49 01/01/2024
CA WEEK 50 01/08/2024
TN WEEK 50 01/08/2024
HI WEEK 50 01/08/2024
GA WEEK 51 01/15/2024
FL WEEK 51 01/15/2024
Here is my code:
def cal_week_start(some_df):
start_dt = datetime.datetime(2024, 1, 1)
for i, wk in enumerate(some_df):
if i < (len(some_df)-1): ## Keep running as long as we haven't reached end of DF
next_value = some_df[i+1] ## index pos of next value
if wk != next_value: ## if value changes from one to the next do something
new_dt = start_dt + timedelta(days = 7) ## Add 7 days.
start_dt = new_dt ## reinitiate variable to add 7 days, 14, 21, etc.
return start_dt
else:
return start_dt
df_new['new_cal_date'] = df_new['week_nbr'].apply(cal_week_start)
It’s a little wonky I know/not fool-proof, but I just need it to execute on a large dataset. It works when I test it outside of the function for some reason, not sure why it just repeats 01/08/2024 all the way through. Any help would be greatly appreciated. I am still learning.