So i try to remove duplicates from the dataframe with this logic below:
df = pd.DataFrame({‘id_SAP_transaction’: [1, 2, 2, 3, 3],
‘checkout_security’: [‘2023-12-15’, pd.NaT, ‘2023-12-01’, pd.NaT, ‘2023-11-30’],
‘nopol’: [‘AA123’, ‘BB456’, ‘CC789’, ‘DD101’, ‘EE234’]})processed_df = process_dataframe(df.copy())
i want if the id_Sap is duplicate but on the checkout_Security it has NaT and date preserve the date, if on the checkout_security column is only has NaT select one of it and if on the checkout_security its all date, choose the latest date.
here is my code:
def process_dataframe(df, no_date_value=pd.NaT):
“””Processes the DataFrame for duplicate removal and checkout_security handling.
Args:
df (pandas.DataFrame): The DataFrame to process.
no_date_value (pd.NaT): Value representing a missing checkout date.
Returns:
pandas.DataFrame: The processed DataFrame.
"""
try:
# Ensure 'checkout_security' is datetime
df['checkout_security'] = pd.to_datetime(
df['checkout_security'], errors='coerce')
# Sort by checkout_security (descending, NaT last)
df = df.sort_values(by='checkout_security',
ascending=False, na_position='last')
def handle_duplicates(group):
# If there's at least one valid date
if not group['checkout_security'] is pd.NaT:
print(group)
idx = group.index[0]
return group.loc[idx] # Return the row using the index
else:
first_row = group.iloc[0]
return first_row
# Remove duplicates based on 'id_SAP_transaction'
df = df.drop_duplicates(subset='id_SAP_transaction', keep='last').apply(
handle_duplicates, axis=1)
return df
except (ValueError, KeyError) as e:
print(f"Error during DataFrame processing: {e}")
return df # Optionally, return partially processed DF
except Exception as e:
print(f"An error occurred: {type(e).__name__} - {e}")
return df # Optionally, return partially processed DF
instead of getting this below expected result:
0 1 2023-12-15 AA123 2 2
2023-12-01 CC789 4 3 2023-11-30 EE234i got this: 0 1 2023-12-15 AA123 2
2 2023-12-01 CC789 4 3 2023-11-30
EE234 1 2 NaT BB456 3
3 NaT DD101
any help on this?
Daily Activity2410 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.