I am trying to write an array of data a binary file with the intention of later accessing said file to locate data. I looked around online finding multiple methods to save data from using df.to_pickle, struct.pack, np.to_bytes(). Now comes the part of reading it. I found other posts requesting reading but so far none have aided in retrieving the data. I am under the impression it might have to do with how I compress the data.
data format
0 1 2 3 . . . n
0
1 -0.111
2 0.84 0.1
3 0.25 0.6 -0.2
.
.
.
n
This is a comparison dataset so at the cross of variables, it is 1. Due to memory allocation constraints and data size, it resulted in generating a data table like this in a text file. Storing in memory was not possible as n can be very very big.
To convert to binary, I read each line in the text, convert the values to the correct format and then use numpy.array.tobytes()
save_path = Path(save_loc)
save_data = save_path.open()
with load_data_path.open() as data_chunk:
for idx, data in enumerate(data_chunk,0):
if idx == 0:
save_data.write(b"n")
continue
dlist = data.strip("n").split(",")
d_array = [np.float64(x) for x in dlist]
save_data.write(d_array.tobytes())
save_data.close()
for reading the data, I attempted to use np.frombuffer and struct.unpack but both resulted in errors. In addition from my understanding, the reading would pull all the data into memory which would not work for my data. I opted to open the binary using path, locating the line and reading it directly. Here is the code
find_line = 7984
load_data = Path(data_loc)
with load_data.open("rb") as data_chunk:
for idx, data in enumerate(data_chunk,0):
if idx < find_line:
continue
else:
my_line = np.frombuffer(data, "f", idx)
break
this however results in an error
ValueError: buffer is smaller than requested size