I have a normal pandas DataFrame which contains some string columns. I am converting the dtypes
to string[pyarrow]
symbol_exch = pd.DataFrame(
[
{'exchange': s['exchange'], 'symbol': s['symbol']}
for s in exchange.symbols
],
)
symbol_exch = symbol_exch.astype(
dtype={'exchange': 'string[pyarrow]', 'symbol': 'string[pyarrow]'},
)
As can be seen, the dtypes
are reported as string[pyarrow]
:
>>> symbol_exch.dtypes
exchange string[pyarrow]
symbol string[pyarrow]
dtype: object
I now have a dask.dataframe
which I am trying to merge with the above:
df = dask.dataframe.from_delayed(tasks, meta)
df = df.merge(symbol_exch, on='symbol', how='left')
Whilst the operation completes successfully, dask issues a warning:
/usr/local/lib/python3.10/site-packages/dask/dataframe/multi.py:520: UserWarning: Merging dataframes with merge column data type mismatches:
+----------------------+-----------------+-------------+
| Merge columns | left dtype | right dtype |
+----------------------+-----------------+-------------+
| ('symbol', 'symbol') | string[pyarrow] | string |
+----------------------+-----------------+-------------+
Cast dtypes explicitly to avoid unexpected results.
warnings.warn(
Even though I explicitly set the dtypes
in the right-hand dataframe, the warning is still issued.
What do I need to do to my symbol_exch
dataframe in order to suppress this warning?