I’m using the yarn container runtime for EMR. To submit a step to the cluster I do this
spark-submit
--deploy-mode cluster
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME
recipe.py
If I want to open up a pyspark shell or use Zeppelin with the yarn container runtime, how do I do it? If I set the same configuration options for the pyspark shell, it doesn’t seem able to find my libraries installed on the docker image.
PYSPARK_PYTHON=ipython
PYSPARK_DRIVER_PYTHON=ipython
pyspark
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME