I need some help with the Python to make corrections in Excel:
Original Table
ArrivalDate | ArrivalTime |
---|---|
2024-01-01 | 2024-01-01 12:00:01 PM |
2024-04-01 | 1990-01-00 12:01:00 PM |
2024-03-01 | 12:03:15 AM |
2024-01-01 | 1990-01-00 |
2024-04-01 |
Desired Table:
ArrivalDate | ArrivalTime |
---|---|
2024-01-01 | 2024-01-01 12:00:01 PM |
2024-04-01 | 2024-04-01 12:01:00 PM |
2024-03-01 | 2024-03-01 12:03:15 AM |
2024-01-01 | Missing Time |
2024-04-01 | Need more Info |
So I used Pandas in Python to try to automate the process
The following is my code:
import pandas as pd
#Read the file
df = pd.read.excel(path/excelfile.xls)
#Covert into datetime format for processing
df[ArrivalDate] = pd.to_datetime(df['ArrivalDate'])
#For arrival time, keeping only the time component
df[ArrivalTime] = pd.to_datetime(df['ArrivalTime', format='%Y-%m-%d %H:%M, errors='coerce').dt.time
#Replace Dates in ArrivalTime with the Dates in ArrivalDate column
df[ArrivalTime] = df.apply(lambda row: pd.to_datetime(row['ArrivalDate'].strftime('%Y-%m-%d') + " " + row['ArrivalTime'].strftime('%H:%M")) if pd.notnull(row['ArrivalTime'])else None, axis=1)
print(df)
This help me generate the table, but there are still some problems:
-
The time format is displaying as 2024-04-01 21:01:00, although I defined the format as “%Y-%m-%d %H:%M”
-
Some time started with 1990-01-00, the code can’t recognize those dates and return NaT
How can I fix these problems?
Rob J is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.