Relative Content

Tag Archive for apache-sparkhadooppysparkhdfs

Spark fails at dropDuplicates() due to multiple ExecutorLostFailure

I have 479 parquet files (each approx. 120 MB, totaling a little over 2 billion records) stored on HDFS and I was trying to determine the best Spark configuration for this dataset, but I am unable to complete the dropDuplicates() operation. My HDFS configuration is:

Spark fails at dropDuplicates() due to multiple ExecutorLostFailure

how to check which HDFS datanode ip is returned by namenode to spark?

If I’m reading/writing a dataframe in PySpark specifying HDFS namenode hostname and port:

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for apache-sparkhadooppysparkhdfs

Spark fails at dropDuplicates() due to multiple ExecutorLostFailure

Spark fails at dropDuplicates() due to multiple ExecutorLostFailure

how to check which HDFS datanode ip is returned by namenode to spark?