Pyspark functions not working
I am unable to execute a simple python script.Here i go with my source and then the problem
I am trying to make the spark run using Dockercompose services.
version: ‘2’
services:
spark:
image: docker.io/bitnami/spark:3.5.1
environment:
– SPARK_MODE=master
– SPARK_RPC_AUTHENTICATION_ENABLED=no
– SPARK_RPC_ENCRYPTION_ENABLED=no
– SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
– SPARK_SSL_ENABLED=no
– SPARK_USER=spark
ports:
– ‘8080:8080’
– ‘7077:7077’
extra_hosts:
– “host.docker.internal:192.168.1.129”
spark-worker:
image: docker.io/bitnami/spark:3.5.1
environment:
– SPARK_MODE=worker
– SPARK_MASTER_URL=spark://spark:7077
– SPARK_WORKER_MEMORY=1G
– SPARK_WORKER_CORES=1
– SPARK_RPC_AUTHENTICATION_ENABLED=no
– SPARK_RPC_ENCRYPTION_ENABLED=no
– SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
– SPARK_SSL_ENABLED=no
– SPARK_USER=spark
extra_hosts:
– “host.docker.internal:192.168.1.129”
and my pyspark script is as below
from pyspark.sql import SparkSession
def main():
# Create a SparkSession using SparkConf
spark = SparkSession.builder
.master("spark://spark-master:7077")
.getOrCreate()
# Create an RDD containing numbers from 1 to 10
numbers_rdd = spark.sparkContext.parallelize(range(1, 11))
# Count the elements in the RDD
count = numbers_rdd.count()
#print(f"Count of numbers from 1 to 10 is: {count}")
# Stop the SparkSession
spark.stop()
if name == “main“:
main()
The above script executes and say it started spark successfully if i comment the line calculating count() but it fails when i give stating tuple index out of range etc..
I have installed the python 3.11 and pyspark version as 3.5.Please help me to understand
Dhaarini is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.