Relative Content

Tag Archive for pandaspyarrow

Why does a pyarrow backend df need more RAM than a numpy backend?

I am reading a large parquet file with int, string and date columns.
When using dtype_backend="pyarrow" instead of default dtype_backend="numpy_nullable", I get 15.6 GB instead of 14.6 GB according to df.info().
Furthermore, I experienced even larger relative overhead using pyarrow for other datasets.