I have made a fat jar file from my project which uses spark to read some CSVs and write to Kafka easily. The project is written in java, when I run the code in IntelliJ Idea it runs ok. But when I try to run the jar file it gives org.apache.spark.sql.AnalysisException: Failed to find data source: kafka.
error.
A. I have used IntelliJ Idea to build the jar file but when running the jar file, it doesn’t load or find the Main class:
Java -jar project_jar.jar nilian.Main
so, I completely gave up on this one.
B. I used Maven to build the jar file, first with this Build configuration:
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.4</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
<filter>
<artifact>org.apache.spark:spark-core_2.13</artifact>
<excludes>
<exclude>sun/nio/ch/DirectBuffer</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>nilian.Main</mainClass>
</transformer>
</transformers>
<createDependencyReducedPom>false</createDependencyReducedPom>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
with this result while running the jar file:
Exception in thread "Thread-25" org.apache.spark.sql.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of Structured Streaming + Kafka Integration Guide.
at org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindKafkaDataSourceError(QueryCompilationErrors.scala:1568)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:645)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:697)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:863)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:257)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:248)
at nilian.outputstuff.writers.KafkaWriter.write(KafkaWriter.java:21)
at nilian.outputstuff.MyOutPutResolver.resolveTrafficOutPut(MyOutPutResolver.java:53)
at nilian.threadstuff.runnables.TrafficRunnable.run(TrafficRunnable.java:171)
at java.base/java.lang.Thread.run(Thread.java:840)
C. Another time with this build configuration and with the same result :
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>nilian.Main</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>nilian.Main</mainClass>
</transformer>
</transformers>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
D. I better be saying that I tried to run the jar file with this command, but it didn’t work out! I mean I still got error on not finding kafka source:
java --add-exports=java.base/sun.nio.ch=ALL-UNNAMED -cp "project_jar.jar:KAFKA_JARs/*" -jar project_jar.jar
Thanks for your help!