I’m try to make connection from Pyspark to Cassandra in virtual environment and the services is installed by docker. I’ve been using the –packages method to solve the dependencies but it seems doesn’t work either.
i am using the the bash command to specify the jar files like this:
./venv/bin/spark-submit --master local[*] --jars /opt/bitnami/spark/jars/spark-cassandra-connector_2.12-3.5.1.jar --packages org.apache.kafka:kafka-clients:3.5.1,org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1 --conf spark.cassandra.connection.host=cassandra --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions /home/rama/project/randomuser/spark_stream.py
I expect it can work properly to submit the job, but it appears that it still error. I assume this is still about the dependencies. the error appears like this
py4j.protocol.Py4JJavaError: An error occurred while calling o54.start. : java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging
pyspark version 3.5.1
Scala version 2.12.18
ramadhani nugraha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.