Relative Content

Tag Archive for apache-sparkpysparkapache-spark-sql

Executor distribution across nodes in a cluster

How are executors of a Spark application distributed across the nodes of a cluster? Let’s say Spark is running in Cluster mode with YARN as the manager. The cluster is said to have 6 nodes, 16 cores each and 64GB mem. With the following configuration, how are the executors distributed across the cluster:

Number of cores in an executor and OOM error

I have read some articles on OOM error in executor of a Spark application and a number of the mention high concurrency as one of the possible reasons. I am aware that the concurrency is determined by the number of cores which determine maximum number of tasks that can run within an executor.

Spark executor memory overhead

from this blog, I understand there is reserved memory within each executor, which amounts to a constant 300MB. In the article as of Spark 1.6, the value of this reserved memory is said to be mutable but requires spark to be recompiled. In the spark config docs, there is spark.executor.memoryOverhead and this config was introduced as of Spark 2.3. Does this config determine the size of the reserved memory which was difficult to change in the Spark 1.6+ versions?