Relative Content

Tag Archive for jsonapache-sparkpysparkparquetamazon-emr

Pyspark – Flatten json column from parquet file with multiple schemas

I have a S3 bucket with parquet files partitioned by a column that serves as an “id” for different jsons that we get. The thing is, even with those ids, the jsons can have variable schemas, even if they have a “pattern”, so I don’t have proper “schemas”, it’ll always be somewhat different.

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for jsonapache-sparkpysparkparquetamazon-emr

Pyspark – Flatten json column from parquet file with multiple schemas