Pyspark – Flatten json column from parquet file with multiple schemas
I have a S3 bucket with parquet files partitioned by a column that serves as an “id” for different jsons that we get. The thing is, even with those ids, the jsons can have variable schemas, even if they have a “pattern”, so I don’t have proper “schemas”, it’ll always be somewhat different.