Convert a date column loaded as strings into a pandas dataframe, in the format DDMMMYYYY
(i.e. %d%b%Y
) to the desired format of YYYYMMDD
(i.e. %Y%m%d
).
Example: Convert 31Oct2023
to 20231031
.
GivenInput ExpectedDates
0 31Jan2023 20230131
1 28Feb2023 20230228
2 31Mar2023 20230331
3 30Apr2023 20230430
4 31May2023 20230531
5 30Jun2023 20230630
6 31Jul2023 20230731
7 31Aug2023 20230831
8 30Sep2023 20230930
9 31Oct2023 20231031
10 30Nov2023 20231130
11 31Dec2023 20231231
12 31Jan2024 20240131
13 29Feb2024 20240229
14 31Mar2024 20240331
15 30Apr2024 20240430
16 31May2024 20240531
0
Solution
Strategy: raw-date-string
–> datetime
–> string format as '%Y%m%d'
A. Snippet with Summary
Convert the string values of the date into type 'datetime64[ns]'
. This makes pandas recognize the string date-values as datetime values. The .dt.strftime('%Y%m%d')
further converts these datetime values into strings in the expected format, %Y%m%d
.
NOTE: The dataframe
df
was created with the code snippet in section: C. Dummy Data.
df["ExpectedOutput"] = df["GivenInput"].astype('datetime64[ns]').dt.strftime('%Y%m%d')
print(df)
Output:
GivenInput ExpectedDates
0 31Jan2023 20230131
1 28Feb2023 20230228
2 31Mar2023 20230331
3 30Apr2023 20230430
4 31May2023 20230531
5 30Jun2023 20230630
6 31Jul2023 20230731
7 31Aug2023 20230831
8 30Sep2023 20230930
9 31Oct2023 20231031
10 30Nov2023 20231130
11 31Dec2023 20231231
12 31Jan2024 20240131
13 29Feb2024 20240229
14 31Mar2024 20240331
15 30Apr2024 20240430
16 31May2024 20240531
B. Notes for the Inquisitive
The following conversion loads the data as datetime; however, the pandas display show the datetime values as follows.
31Oct2023
–>2023-10-31
df["DateTimeAsIs"] = df["GivenInput"].astype('datetime64[ns]')
print(df)
Output:
GivenInput ExpectedOutput DateTimeAsIs
0 31Jan2023 20230131 2023-01-31
1 28Feb2023 20230228 2023-02-28
2 31Mar2023 20230331 2023-03-31
3 30Apr2023 20230430 2023-04-30
4 31May2023 20230531 2023-05-31
5 30Jun2023 20230630 2023-06-30
6 31Jul2023 20230731 2023-07-31
7 31Aug2023 20230831 2023-08-31
8 30Sep2023 20230930 2023-09-30
9 31Oct2023 20231031 2023-10-31
10 30Nov2023 20231130 2023-11-30
11 31Dec2023 20231231 2023-12-31
12 31Jan2024 20240131 2024-01-31
13 29Feb2024 20240229 2024-02-29
14 31Mar2024 20240331 2024-03-31
15 30Apr2024 20240430 2024-04-30
16 31May2024 20240531 2024-05-31
C. Dummy Data
Snippet to load the sample data.
import pandas as pd
from io import StringIO
s = """
GivenInput
31Jan2023
28Feb2023
31Mar2023
30Apr2023
31May2023
30Jun2023
31Jul2023
31Aug2023
30Sep2023
31Oct2023
30Nov2023
31Dec2023
31Jan2024
29Feb2024
31Mar2024
30Apr2024
31May2024
"""
s = s.strip()
df = pd.read_csv(StringIO(s), sep=',')
print(df)