In a Spark data pipeline, I want to rely on mapPartitions to run some computations. I prepare some data and want to store it in partitions using DataFrameWriter.partitionBy.
Is it guaranteed that each of the partitions contain all columns? Given Parquet’s columnar nature, I am a bit confused on whether I can trust that each Parquet file will actually contain full column set or not. It seems to be the case for a small dataset, but can I rely on it to be the case?