Whenever I need to export data from Bigquery to parquet, I find myself in the following situation, either I use:
dask-bigquery
: which takes around 40 min for my dataset; outputing 700 files of ~12MB; or- the
export as
statement letting BQ to exec the export: takes around 2 min; but generates 10.000 files of ~200KB, which then makes it unusable to work with, without an expensive repartitioning operation.
Is there a way to get the best of both worlds? i.e. using the export statement in BQ while configuring / optimising the partitioning strategy?