I have a pandas dataframe with a datetime column of the following values:
0 2020-01-01 23:00:57+00:00
1 2021-01-01 23:00:57+00:00
2 2022-01-01 23:00:57+00:00
3 2023-01-01 23:00:57+00:00
4 2024-01-01 23:00:57+00:00
when I write this dataframe to arrow file using the following command:
import pyarrow as pa
writer = pa.ipc.new_file("/data.arrow", schema)
writer.write(table)
I observed that the values become nanoseconds (long integers) when I read it in Java.
However when the dataframe column only has dates:
2022-01-01
2022-01-02
2022-01-03
2022-01-04
I get the actual datetime formats.
Is this expected in pyarrow improve the IO efficiency?