Relative Content

Tag Archive for pandaspyarrow

Why does a pyarrow backend df need more RAM than a numpy backend?

I am reading a large parquet file with int, string and date columns.
When using dtype_backend="pyarrow" instead of default dtype_backend="numpy_nullable", I get 15.6 GB instead of 14.6 GB according to df.info().
Furthermore, I experienced even larger relative overhead using pyarrow for other datasets.

pyarrow.lib.ArrowCapacityError when creating string

I’d like to create a new string column, but pandas using pyarrow backend throws an ArrowCapacityError: array cannot contain more than 2147483646 bytes, have 3525828799

get seconds from pandas timedelta with pyarrow dtype

I have a dataframe with pyarrow dtypes such as `duration[ns][pyarrow]’.

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for pandaspyarrow

Why does a pyarrow backend df need more RAM than a numpy backend?

pyarrow.lib.ArrowCapacityError when creating string

get seconds from pandas timedelta with pyarrow dtype