Relative Content

Tag Archive for pyarrow

Is there a way to validate Python objects against a pyarrow schema?

I have a pyarrow Schema defined and a list of native Python dictionaries. I can use

AttributeError: module ‘pyarrow.lib’ has no attribute ‘ListViewType’

AttributeError Traceback (most recent call last) Cell In[67], line 2 1 import pandas as pd —-> 2 import pyarrow.parquet as pq 3 import pyarrow.lib as _lib 4 import pyarrow as pa File /opt/anaconda3/lib/python3.11/site-packages/pyarrow/parquet/init.py:20 1 # Licensed to the Apache Software Foundation (ASF) under one 2 # or more contributor license agreements. See the NOTICE file […]

Converting pyarrow Table to RecordBatches of a fixed byte size

I appreciate this might be impossible in the most general, case but is there a way to convert a pyarrow table into record batches of a fixed maximum byte size? I know I can request a maximum row size, so this should be possible if I can find the maximum row size for a given schema. I didn’t see anything in the docs of either pa.Schema or pa.Field about their sizes?

Presenting Python enums in PyArrow and Parquet files

Does PyArrow (Arrow) and its Parquet file format have a native way to serialise Python enums? If not what would be the storage-efficient way to encode enum strings, so that preferably other Parquet readers could also understand them?

Does arrow file convert pandas timestamp of datetime64[ns] values to nanoseconds?

I have a pandas dataframe with a datetime column of the following values:

Thiết kế website giá rẻ

Danh mục