I have an async method that I want to run on every row in all partitions of a Dask dataframe:
async def async_api_call(text: str):
return text[:3]
Here is an example dataset with 100 rows of text:
import lorem
import pandas as pd
df = pd.DataFrame({"Text": [lorem.text() for _ in range(100)]})
ddf = dd.from_pandas(df, npartitions=4)
What I try to do is:
ddf["new"] = ddf.map_partitions(
lambda partition: partition["Text"].apply(async_api_call),
meta=(None, str),
)