I have a minikube cluster installed on my local machine and have installed spark-operator on it. I have also setup the necessary service account and RBAC for it. However when i try to run my Pyspark application as a ‘SparkApplication’ it is failing with below error.
Error: failed to start container "spark-kubernetes-driver": Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "driver": executable file not found in $PATH: unknown
However, when i try to run it as a Job, it is successful. Which means there is no issue with the app code. Also when i try to run a docker container on the image it also works.
I am not sure what is wrong.
Below is the Dockerfile i am using.
# Use an official OpenJDK runtime as a parent image
FROM openjdk:8-jdk
# Set environment variables
ENV PYSPARK_PYTHON=python3
ENV PYTHON_VERSION=3.8.12
ENV SPARK_VERSION=3.2.0
ENV HADOOP_VERSION=3.2
# Install Python and necessary dependencies
RUN apt-get update &&
apt-get install -y python3 python3-pip &&
apt-get clean
# Install PySpark
RUN pip3 install pyspark==$SPARK_VERSION
# Set up environment variables for Spark
ENV SPARK_HOME=/spark
ENV PATH=$PATH:$SPARK_HOME/bin
ENV PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
# Download and extract Apache Spark
RUN wget -qO- https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION.tgz | tar -xvz -C / &&
mv /spark-$SPARK_VERSION-bin-hadoop$HADOOP_VERSION /spark
# Set working directory
WORKDIR /app
# Copy your application files to the container
COPY . .
RUN mkdir /app/jars
# Set dependencies
RUN wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-azure/3.3.1/hadoop-azure-3.3.1.jar -P /app/jars
RUN chmod 775 ./src/python/spark_app.py
# Define the command to run your application
ENTRYPOINT ["python3","./src/python/spark_app.py"]
Below is the application manifest:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: spark-apps
spec:
type: Python
mode: cluster
image: "image:tag"
imagePullPolicy: IfNotPresent
mainApplicationFile: "local:///app/src/python/spark_app.py"
sparkVersion: "3.2.0"
restartPolicy:
type: Never
driver:
cores: 1
memory: "512m"
labels:
version: 3.2.0
serviceAccount: spark
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.2.0