I have a big CSV file (60GB) that does not fit into RAM. The first column contains a sorted index that goes from 2000 to 2999 and can be repeated between rows. I want to split the 60GB file into 10 files of approximately 6 GB each, but without “splitting” the index between two of the files.
Thus, the first one will have rows with index 2000 to 2099, the next one with items from 2100 to 2199, etc. The files should also keep the header of the first file.
I can’t use a tool like qsv’s split, because the number of rows might be different in each of the 10 files. I also tried using qsv apply
, but it seems to try to load everything into the RAM.