Relative Content

Tag Archive for apache-sparkhadoopsplithdfs

.gz files are unsplittable. But if I place them in HDFS, they create multiple blocks depending on block size

We all know .gz is non-splittable, that means only single core can read it. This means, when I place a huge .gz file on HDFS, it should actually be present as a single block. I see it is getting split into blocks of 128MB, how is it possible to split in HDFS but not in Spark?

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for apache-sparkhadoopsplithdfs

.gz files are unsplittable. But if I place them in HDFS, they create multiple blocks depending on block size