This is my current code since there were many errors I thought let’s run a basic code first but I am having this error
# Current Code
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split
spark = SparkSession
.builder
.appName("StructuredNetworkWordCount")
.getOrCreate()
lines = spark
.readStream
.format("socket")
.option("host", "localhost")
.option("port", 9999)
.load()
words = lines.select(
explode(
split(lines.value, " ")
).alias("word")
)
wordCounts = words.groupBy("word").count()
query = wordCounts
.writeStream
.outputMode("complete")
.format("console")
.start()
query.awaitTermination()
#Error
StreamingQueryException: [STREAM_FAILED] Query [id = 3e19a3c3-4e4e-4da6-9ed0-f9e8a6119894, runId = 25218498-e228-4954-a200-8e2df9accaec] terminated with exception: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'
I tried changing the winutils.exe file for the hadoop bin folder
I tried using the code directly from the pyspark documentation thinking it could work
But nothing seems to work
I have resolved many errors but this doesn’t seem to be solved
New contributor
VASHISTHA RAVINDRA PANDYA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.