Dask – How to optimize the computation of the first row of each partition in a dask dataframe?
My overall goal is to read several csv files, do some computation, save them as a parquet database using the partition_on option in the to_parquet function.