Relative Content

Tag Archive for pythonlambdanested-json

Read Large JSON File and Flatten Lambda

I have a lambda function that reads in a json file and flattens it so that I can convert it to parquet and it’ll be easier to read the data. The issue I’m having is that some of the files are too large and the lambda function times out before it can flatten the file. I’ve tried to use a Glue Job with the exact same code as a workaround to the timeout but it takes hours to process just one file. So after doing some research, I found that reading the json in chuncks might help. I was able to read it successfully in chunks and convert to parquet but the issue is how can I flatten the json while reading it in chunks. Here is my code. I need to flatten the json in the first if statement where ‘largefile’ is in the fullstring. The Elif section is just flattening normal smaller json files. The else statement is just reading in the json file without flattening. Then I convert all datatypes to string and convert to parquet. Any help is appreciated.