I am a Product Owner and I am working on configuring our Spark deployment, which is running in a Kubernetes environment on Azure. My development team doesn’t know much about this, so I’m researching on my own.
I’m trying to determine the values to configure for our executors per worker. After reading a lot of documentation, here is the configuration I plan to ask the team to implement:
- Retrieve the available cores and RAM on the worker
- Deduct 1 core and 1 GB for the OS and Hadoop daemon
- Set spark.executor.cores = 3, which allows us to define the maximum number of executors as: (worker_cores – 1) / 3
- Set spark.executor.memory as: (worker_memory – 1 GB) / max_executors
And this is where my question arises: I can’t seem to find a clear answer on whether spark.executor.memoryOverhead and spark.memory.offHeap.size need to be added for each executor, or if they are calculated once per worker and are thus independent of the number of executors?
Thank you for your help!
I am expecting to understand how to properly setup spark memory overhead and off-heap
Sébastien Brignoli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.