Am getting the following issue “pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 3036-12-31 00:00:00, at position 45100”
I dont want to do the following as this will coerce all errors to NaT (not a time)
s = pd.to_datetime(s, errors='coerce')
Is there no way to keep the dates and not convert them to NaT/nan/null?
this is the function causing the error
def remove_duplicates_based_on_keydate(df, id_col, date_col):
df = df.copy()
df.loc[:, date_col] = pd.to_datetime(df[date_col])
# Sort by the date column in descending order
df_sorted = df.sort_values(by=date_col, ascending=False)
# Drop duplicates, keeping the first occurrence (latest date)
df_unique = df_sorted.drop_duplicates(subset=id_col, keep='first')
# Sort again by id_col and reset index
df_unique = df_unique.sort_values(by=id_col).reset_index(drop=True)
return df_unique