I’m migrating csv files to parquet, and the schema of my csv files changed during the time.
So i’m trying to read some csv files that contains 12 columns with a schema that contains 20 columns.
Do you know if there is a way to handle this with pyspark ?
So I tried to read some csv files with a Pyspark schema but the output just gives me NULL values everywhere.
But when I tried without specifying the schema, all the values are good, but the schema of the Pyspark Dataframe is just (_c0, _c1 …) and the types are string
Any help would be very appreciate! Thanks