red hat docker connect yarn:
When I use PySpark to connect to a YARN cluster, an error occurs: Java gateway exited before sending its port number,The error in PySpark appears in the following files: context.py, session.py, and java_gateway.py,My Spark version is 3.1.1, PySpark version is 3.1.1, and Java version is 1.8. The environment variable JAVA_HOME has already been set(use kerberos)
Could you please tell me where the problem might be?
I have compressed the Python environment and placed it in HDFS, set the Python directory in the cluster,configured the Hive environment variables, and set the driver’s IP address. Communication between the Docker container and the cluster is normal
james w is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.