I have a very large dataframe when exported to json is about 3.5GB. I noticed that while exporting it to json, it is consuming extra 7GB memory. Looks like panda’s dataframe is holding the whole json string (3.5GB) in memory first and then only writing to file instead of directly streaming the generated string/bytes into file. On top on that, extra 3.5GB is also being consumed while writing the json string to file, i think, to encode string to bytes. Why is it not encoding in chunks?
Is there a way to export json directly into file without storing whole json in memory first, i.e. only storing some buffer (about 10-50mb) in memory.
from time import sleep
import pandas as pd
large_df: pd.DataFrame
sleep(5)
with open("exported.json", "w") as f:
large_df.to_json(f, orient="records", date_format="iso")
sleep(5)