I haven’t found a way to do the following yet:
- Read a very large data set
- Write the data to an Excel file (avoid JVM out-of-memory, but ideally don’t want to write the whole thing to disk)
- Upload it in parts (to S3 in my case)
Hoping to do all those piece by piece (read some, generate some, write some, repeat).
But haven’t figured out if I can both avoid memory problems and avoid writing to disk. Something like this could maybe work? :
- Incrementally read the data using pagination
- Apache POI Streaming looks like a good way to generate it in memory
- Use AWS S3 multipart upload to transfer it incrementally
But (2) seems to be a problem:
Apache POI streaming keeps a limited number of rows in memory which is great, but the end result is still flushing the workbook to file.
Has anyone done something like this before? (And is it even possible given that Excel is a binary format?)