I need home help handling the new “PerformanceWarning: Falling back on a non-pyarrow code path which may decrease performance” reported by dask when running operations over some no pyarrow dtypes.
As far as I´ve read, pyarrow dtypes have better performance and lower memory consumptions! (that’s great!), but I am struggling to understand how do I convert my parquet/pandas dataframe to arrow types before running my operations!
sample dataframe
import dask.dataframe as dd
import pandas as pd
import numpy as np
dataframe = pd.DataFrame({
'status' : ['pending', 'pending','pending', 'canceled','canceled','canceled', 'confirmed', 'confirmed','confirmed'],
'clientId' : ['A', 'B', 'C', 'A', 'D', 'C', 'A', 'B','C'],
'partner' : ['A', np.nan,'C', 'A',np.nan,'C', 'A', np.nan,'C'],
'product' : ['afiliates', 'pre-paid', 'giftcard','afiliates', 'pre-paid', 'giftcard','afiliates', 'pre-paid', 'giftcard'],
'brand' : ['brand_1', 'brand_2', 'brand_3','brand_1', 'brand_2', 'brand_3','brand_1', 'brand_3', 'brand_3'],
'gmv' : [100,100,100,100,100,100,100,100,100]})
dataframe = dataframe.astype({'clientId':'string','partner':'string','status':'string','product':'string', 'brand':'string'})
dataframe = dd.from_pandas(dataframe, npartitions = 1)
Now I run the str replace operation
dataframe['brand_test'] = dataframe['brand']
dataframe['brand_test'] = dataframe['brand_test'].str.replace('brand_1', 'PlayStation', case=False, regex = False)
and I get this warning for every single row in my DF.
c:UsersF3164582AppDataLocalProgramsPythonPython311Libsite-packagesdask_expr_accessor.py:102: PerformanceWarning: Falling back on a non-pyarrow code path which may decrease performance.
out = getattr(getattr(obj, accessor, obj), attr)(*args, **kwargs)
1) If I want, how do I silence this warning?
2) How do I “solve” this? Could I change the datatype to pyarrow string before the replace operation?
3) If pyarrow dtypes are so much better, how do I convert my hole dataframe to equivalent pyarrow datatype? Why does pandas/dask just does not convert it automatically?
Thanks in advance!