I’m encountering an issue when trying to write a DataFrame from Apache Spark to Cassandra. Here’s the setup:
I’m running Apache Spark version 3.3.3 in standalone mode on my local machine (bigdatapc
). I have a DataFrame named routesDF
that I’m trying to write to a Cassandra table. The DataFrame schema looks like this:
<code>routesDF.printSchema()
|-- airline: string (nullable = true)
|-- airline_id: integer (nullable = true)
|-- source_airport: string (nullable = true)
|-- source_airport_id: integer (nullable = true)
|-- destination_airport: string (nullable = true)
|-- destination_airport_id: integer (nullable = true)
|-- codeshare: string (nullable = true)
|-- stops: integer (nullable = true)
|-- equipment: string (nullable = true)
<code>routesDF.printSchema()
root
|-- airline: string (nullable = true)
|-- airline_id: integer (nullable = true)
|-- source_airport: string (nullable = true)
|-- source_airport_id: integer (nullable = true)
|-- destination_airport: string (nullable = true)
|-- destination_airport_id: integer (nullable = true)
|-- codeshare: string (nullable = true)
|-- stops: integer (nullable = true)
|-- equipment: string (nullable = true)
</code>
routesDF.printSchema()
root
|-- airline: string (nullable = true)
|-- airline_id: integer (nullable = true)
|-- source_airport: string (nullable = true)
|-- source_airport_id: integer (nullable = true)
|-- destination_airport: string (nullable = true)
|-- destination_airport_id: integer (nullable = true)
|-- codeshare: string (nullable = true)
|-- stops: integer (nullable = true)
|-- equipment: string (nullable = true)
To write this DataFrame to Cassandra, I’m using the following command in the Spark shell:
.format("org.apache.spark.sql.cassandra")
.option("keyspace", "practica")
.option("table", "routes")
<code>routesDF.write
.format("org.apache.spark.sql.cassandra")
.option("keyspace", "practica")
.option("table", "routes")
.mode("Append")
.save()
</code>
routesDF.write
.format("org.apache.spark.sql.cassandra")
.option("keyspace", "practica")
.option("table", "routes")
.mode("Append")
.save()
And I have created this table in Cassandra
<code>cqlsh:practica> CREATE TABLE routes (
destination_airport text,
PRIMARY KEY (source_airport, destination_airport)
<code>cqlsh:practica> CREATE TABLE routes (
source_airport text,
destination_airport text,
distance int,
PRIMARY KEY (source_airport, destination_airport)
</code>
cqlsh:practica> CREATE TABLE routes (
source_airport text,
destination_airport text,
distance int,
PRIMARY KEY (source_airport, destination_airport)
However, I’m encountering the following error:
<code>java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
<code>java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
</code>
java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
It seems like there’s an issue with the spark-cassandra-connector
library. I’ve already included the JAR file spark-cassandra-connector_2.12-3.3.0.jar
in the bin/spark-shell
command when starting the Spark shell, so I’m not sure why I’m still getting this error.
<code>bin/spark-shell --master spark://bigdatapc:7077 --driver-memory 2G --executor-memory 2G --total-executor-cores 2 --executor-cores 1 --jars postgresql-42.7.3.jar,spark-cassandra-connector_2.12-3.3.0.jar
<code>bin/spark-shell --master spark://bigdatapc:7077 --driver-memory 2G --executor-memory 2G --total-executor-cores 2 --executor-cores 1 --jars postgresql-42.7.3.jar,spark-cassandra-connector_2.12-3.3.0.jar
</code>
bin/spark-shell --master spark://bigdatapc:7077 --driver-memory 2G --executor-memory 2G --total-executor-cores 2 --executor-cores 1 --jars postgresql-42.7.3.jar,spark-cassandra-connector_2.12-3.3.0.jar
Any insights or suggestions on how to resolve this issue would be greatly appreciated. Thank you!
I have tried to change the jar file to a more recent version, spark-cassandra-connector_2.12-3.5.0.jar
, and to assembly version, spark-cassandra-connector-assembly_2.12-3.5.0.jar
. Both tries unsuccessful.
The error with the first one is the same as the one I already presented, but with the latter I got this:
<code>ava.lang.NoSuchMethodError: org.apache.spark.sql.connector.write.streaming.StreamingWrite.useCommitCoordinator()Z
at com.datastax.spark.connector.datasource.CassandraBulkWrite.useCommitCoordinator(CassandraWriteBuilder.scala:116)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:366)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:353)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:244)
<code>ava.lang.NoSuchMethodError: org.apache.spark.sql.connector.write.streaming.StreamingWrite.useCommitCoordinator()Z
at com.datastax.spark.connector.datasource.CassandraBulkWrite.useCommitCoordinator(CassandraWriteBuilder.scala:116)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:366)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:353)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:244)
</code>
ava.lang.NoSuchMethodError: org.apache.spark.sql.connector.write.streaming.StreamingWrite.useCommitCoordinator()Z
at com.datastax.spark.connector.datasource.CassandraBulkWrite.useCommitCoordinator(CassandraWriteBuilder.scala:116)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:366)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:353)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:244)