I’m trying to write a large data (a tuple of dict; dict contains key and Dask Array) to disk as HDF using the function below.
def write_to_hdf5(file_path, data):
with h5py.File(file_path, 'w') as f:
for result in data:
if result:
for key, value in result.items():
# Create a dataset with optimal compression
dset = f.create_dataset(key, data=value, compression="gzip", compression_opts=9)
This results in the error UserWarning: Sending large graph of size 47.61 MiB. This may cause some slowdown. Consider scattering data ahead of time and using futures.
as takes a long time to write to disk (~10GB). Is there any optimal way to write a HDF file with key value data?
From Dask website, da.to_hdf5('myfile.hdf5', '/x', x, compression='lzf', shuffle=True)
writes an array to HDF; I’m not sure how to add label/key for each array so that I can visualize it in Panoply.