I submit a dataproc job as below:
gcloud dataproc jobs submit pyspark
gs://persopnal/test/file1.py
--py-files gs://persopnal/test/file1.py,gs://persopnal/test/file2.py
--jars $jar
--archieves gs://persopnal/test/py37-p-env.zip$ImageEnv
--properties spark.yarn.appMasterEnv.PYSPARK_HOME=./ImageEnv/py37-p-env/bin/python
--region $region
--cluster $cluster
--project $project
--
-i 1
-p 1
but it throws an exception like this:
Can not run program “./ImageEnv/py37-p-env/bin/python”: error=2, No such file or directory
I unpack gs://persopnal/test/py37-p-env.zip and I can find ./ImageEnv/py37-p-env/bin/python
Is there any problem with this ZIP archive?
I want to know why this error occurs, and how to solve it, because there are too many files in py37-p-env.zip
, I can’t add them to --files
parameter