I have a large number of large integers, e.g. generated using:
<code>import random
low = 10 ** 10
high = 10 ** 100
numbers = sorted([random.randint(low, high) for _ in range(10000000)])
</code>
<code>import random
low = 10 ** 10
high = 10 ** 100
numbers = sorted([random.randint(low, high) for _ in range(10000000)])
</code>
import random
low = 10 ** 10
high = 10 ** 100
numbers = sorted([random.randint(low, high) for _ in range(10000000)])
It can be assumed that they will always be sorted before saving. My goal is to save them to a file for long-term storage in the most space-efficient way. I have tried several approaches.
- Plain text and
.tar.gz
gave me the best compression so far - Saving to
pickle
was fast (which is of secondary importance but not as important as space efficiency) and produced resulting size comparable (but worse) than.tar.gz
- I tried converting to byte arrays and storing to
h5
usingh5py
andgzip
compression, however that was both slow and not as space efficient as the other 2 methods.
User @RobbyCornelissen suggested another method in this answer, however I am not sure how to implement it in Python.