.gz files are unsplittable. But if I place them in HDFS, they create multiple blocks depending on block size
We all know .gz is non-splittable, that means only single core can read it. This means, when I place a huge .gz file on HDFS, it should actually be present as a single block. I see it is getting split into blocks of 128MB, how is it possible to split in HDFS but not in Spark?