I have some code which tries to extract all rows between two dates. In Dask I tried two solutions, firstly the obvious way, with t1
and t2
being numpy.datetime64
:
df[(df["date"] => t1) & df["date"] <= t2]
which works. I was wondering why using set_index
and then indexing didn’t work though :
df.set_index("date", sorted=True)
df.loc[t1:t2].compute()
This gives multiple exceptions :
KeyError: numpy.datetime64('2023-10-28T06:00:00.000000000')
numpy.exceptions.DTypePromotionError: The DType <class 'numpy.dtypes.DateTime64DType'> could not be promoted by <class 'numpy.dtypes.Int32DType'>. This means that no common DType exists for the given inputs. For example they cannot
be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.DateTime64DType'>, <class 'numpy.dtypes.Int32DType'>)
numpy.core._exceptions.UFuncTypeError: ufunc 'greater' did not contain a loop with signature matching types (<class 'numpy.dtypes.DateTime64DType'>, <class 'numpy.dtypes.Int32DType'>) -> None
I tried wrapping with to_datetime()
from pandas and dask with no luck. Is this a type error or am I misusing the set_index
interface?