I’m running a glue python-shell script, and I include extra-py-files
that are paths in S3 to wheels I’ve built for the script. These are installed as expected.
When I attach a Glue Connection to the job details, in order to allow for a Redshift connection from the script, the glue script doesn’t install from the s3 paths but instead tries to connect to Pypi.org to install
I get
__main__.CommandFailedException: /tmp/glue-python-libs-cVlK/SQLAlchemy-1.4.52-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl installation failed after 2th retry due to exception: CalledProcessError
Can I allow my glue script to connect to redshift while still installing the dependencies directly from s3 wheels rather than installing from pypi.org? Otherwise I’ll have to create a connection and/or NAT gateway to allow for this connection, which I’d like to avoid.
1
Note the following limitations of Python Shell jobs:
-
You can’t use job bookmarks with Python shell jobs.
You can’t package any Python libraries as .egg files in Python 3.9+.
Instead, use .whl.The –extra-files option cannot be used, because of a limitation on temporary copies of S3 data.
https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#python-shell-limitations
1