Update:
was able to work around the issue by converting it to numpy() first and then do a reshape.
Before edit:
I have a python program where I am using polars dataframe for reading from the file and then looping through. I have a function which is built on pandas and which i cant make changes to. Since this is in pandas, I am converting the polars to pandas before sending it to this function.
to convert the dataframes, I am using
df = df_pl.to_pandas(use_pyarrow_extension_array = True)
Using the pyarrow helps with the performance immensely. But, it also seems to create an error while I try to reshape the dataframe later.
I am getting an exception when i run this code.
X_test = validation.year.array.reshape(1, -1)
<class ‘pandas.core.arrays.arrow.array.ArrowExtensionArray’> does not support reshape as backed by a 1D pyarrow.ChunkedArray.
Is there a way to use the pyarrow but use the reshaping on this?
Adding some example code to understand better:
#using polars
import polars as pl
import pandas as pd
data = {"group1": [1, 1,2,3], "group2": [3, 3,3,2],"group3": [1, 1,4,3], "group4": [3, 3,3,1],"year":[2020,2019,2020,2018]}
df_pl = pl.DataFrame(data)
for group in df_pl.group_by("group1","group2","group3","group4"):
print(group)
df_pd = df_pl.to_pandas(use_pyarrow_extension_array = True)
split_point = len(df_pd) - 1
df_1, validation = df_pd[0:split_point].copy(), df_pd[split_point:]
test = validation.year.array#.reshape(1, -1)
print(test)
I have commented the reshape here. I also tried testing with the pandas directly.
Issue is that when working with pandas directly, it creates a numpyarray. but conversion from polar to pandas creates a pyarrowarray. And it seems pyarrowarray doesnt do reshaping.
Update:
was able to work around the issue by converting it to numpy() first and then do a reshape.
4