I’m trying to convert the contents of a str timestamp column to datetime. The input timestamp string differs from the STANDARD_DATETIME_FORMAT
, but Pandas 1.5.3 is able to do the conversion just fine:
import pandas as pd
data = {
'timestamp': [
'2024-05-02 10:00:00.000000+0000',
'2024-05-02 10:00:01.000000+0000',
'2024-05-02 10:00:02.000000+0000',
'2024-05-02 10:00:03.000000+0000',
'2024-05-02 10:00:04.000000+0000'
],
'value': [False, False, False, False, False]
}
df = pd.DataFrame(data)
STANDARD_DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
pd.to_datetime(df['timestamp'], format=STANDARD_DATETIME_FORMAT)
In our code, we were using Pandas 1.5.3 with Python 3.8, but now we’re updating to 2.2.2 with Python 3.12.
With 1.5.3 + Python 3.8, there seems to be some kind of format auto detecting, even though it’s not being explicited in the function arguments.
With 2.2.2 + Python 3.12, I get the following error from the code snippet above:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[9], line 1
----> 1 pd.to_datetime(df['timestamp'], format=STANDARD_DATETIME_FORMAT)
File ~/tmp/test-pandas/.venv3_12/lib/python3.12/site-packages/pandas/core/tools/datetimes.py:1067, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
1065 result = arg.map(cache_array)
1066 else:
-> 1067 values = convert_listlike(arg._values, format)
1068 result = arg._constructor(values, index=arg.index, name=arg.name)
1069 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
File ~/tmp/test-pandas/.venv3_12/lib/python3.12/site-packages/pandas/core/tools/datetimes.py:433, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, dayfirst, yearfirst, exact)
431 # `format` could be inferred, or user didn't ask for mixed-format parsing.
432 if format is not None and format != "mixed":
--> 433 return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
435 result, tz_parsed = objects_to_datetime64(
436 arg,
437 dayfirst=dayfirst,
(...)
441 allow_object=True,
442 )
444 if tz_parsed is not None:
445 # We can take a shortcut since the datetime64 numpy array
446 # is in UTC
File ~/tmp/test-pandas/.venv3_12/lib/python3.12/site-packages/pandas/core/tools/datetimes.py:467, in _array_strptime_with_fallback(arg, name, utc, fmt, exact, errors)
456 def _array_strptime_with_fallback(
457 arg,
458 name,
(...)
462 errors: str,
463 ) -> Index:
464 """
465 Call array_strptime, with fallback behavior depending on 'errors'.
466 """
--> 467 result, tz_out = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
468 if tz_out is not None:
469 unit = np.datetime_data(result.dtype)[0]
File strptime.pyx:501, in pandas._libs.tslibs.strptime.array_strptime()
File strptime.pyx:451, in pandas._libs.tslibs.strptime.array_strptime()
File strptime.pyx:587, in pandas._libs.tslibs.strptime._parse_with_format()
ValueError: unconverted data remains when parsing with format "%Y-%m-%d %H:%M:%S": ".000000+0000", at position 0. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
In 2.2.2 + Python 3.12, in order for this to work I need to fix STANDARD_DATETIME_FORMAT
to match the input timestamp str.
I went on to read the function reference documentation but, in 1.5.3, all parameters that would somehow affect this outcome are all correctly set by default already: exact=True
and infer_datetime_format=False
.
I’ll be fixing the format, but I’d like to know exactly what changed from 1.5.3 to 2.x.x in this aspect.