Relative Content

Tag Archive for pythonmachine-learningamazon-sagemaker-studioamz-sagemaker-distributed-training

Unable to launch sagemaker training job due to issues with image container

I am trying to launch a sagemaker training job from within sagemaker studio code editor instance. I have a custom docker image with a requirements.txt file with a series of python libraries to be installed. This process leverages the sagemaker training toolkit, unfortunately, that collection of libraries has a dependency on a module deprecated in python 3.10 and greater (see this github open issue for more). So I decided to run my container using python 3.9 but that is now causing it’s own set of problems related to the toolkit’s sagemaker-containers library. When trying to launch the training job I get this error: