I have a SageMaker job that runs and creates a model for making predictions.
The process starts with createProcessingJob, createTrainingJob, createModelJob and createTransform.
I have been getting an error in the createProcessingJob and createTransform job.
the docker image I used for createTransformJob is python:3.11
and as I researched online, I have tried multiple ways to exit a code.
sys.exit(0)
os._exit(0)
os.abort()
os.kill()
os.kill(os.getpid(), singal.SIGTERM)
none of the above worked.
I found this post: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-signal-success-failure.html
and it gave me an idea that maybe I was missing permission.
However, I saw no error message when I checked the AWS CloudTrail.
So currently, there is no error message I can find but the job is not completing correctly.
I cannot find the post but I also saw an AWS post recommending to use sys.exit().
The post also mentioned depending on the image, the program does not exit by itself.
So if I use python:3.11, what should I do to exit the code?
or am I not supposed to use python3.11?
I do not want to use the stop condition or manually stop and handle this case.
any help will be appreciated.
Thank you very much in advance !! 😉