On my ubuntu server, I have a few JSON files containing scraped data. Each file is around 1GB.
What is the best way to store all the files in one database and query it.
Here’s an example of the JSON files look like:
{"id":"emp1","full_name":"John Doe","gender":"male","birth_year":"1998","birth_date":"1998-12-01","job_title":"team lead","location_country":"france","emails":[{"address":"[email protected]","type":"personal"}, {"address":"[email protected]","type":"work"}]}
{"id":"emp2","full_name":"Jane Smith","gender":"female","birth_year":null,"birth_date":null,"job_title":"Manager","location_country":"italy","emails":[{"address":"[email protected]","type":"personal"}, {"address":"[email protected]","type":"work"}]}
{"id":"emp3","full_name":"Dave Davids","gender":"male","birth_year":"1991","birth_date":"1991-10-01","job_title":"Intern","location_country":"germany","emails":[{"address":"[email protected]","type":"personal"}, {"address":"[email protected]","type":"work"}]}
{...}
I have tried converting the JSON to CSV using jq but ran into many issues, especially since there are many headers with a few containing null values.
I tried to import the JSON files using workbench but it wouldn’t read the JSON files as correct JSON files.
What do you think is the best type of database to store these files? and what would be the process (python script, bash script, sql)?