I’ve some Cuda code and I’d like to write values out as quickly as possible in a well-organised format (I think HDF5 would be a good candidate, most users will want to load the data using python or matlab). The data is currently stored in a location allocated by a cudaMallocManaged call, addressed using a float pointer. I’ve seen the HighFive library which seems to work with CUDA (you can compile it with nvcc, at least), but that seems to require std::vector or boost types.
Ideally I’d not have to copy it to a standard array and then output, which is a solution I’ve seen elsewhere. If I could do this using GPUDirect, then wow!
I don’t have to worry about any multi-threading issues, as I’m using CUDA to calculate how the values in a cube evolve with time, and I’m just writing that out every n timesteps.