PySpark Production Container based best practices
I’m quite new to PySpark and was looking for some advice on how to kit production environments with docker. I’m building an ML pipeline that continuously consumes events from Kafka. I mention this in passing to emphasise that (a) there will be many dependencies for my main.py file and (b) the job will be running on the cluster at all times. In digging around there seems to be quite a few options (couple mentioned below), but was looking for guidance on best practices.
Spark Cluster in Docker How to Connect make SparkSession in windows Intellij Idea
I Install Spark Cluster my Spark Master Ports is as 0.0.0.0:7077->7077/tcp, 6066/tcp, 0.0.0.0:8080->8080/tcp and container ID is 894e4b6f96bb how to make spark session please guide unable to find host name this way i am create spark session Spark Cluster in Spark, Cluster install under same network
where hadoop and hive is installed