I’m testing a sagemaker pipeline containing a number of typical steps, i.e. “preprocessor” and “training”. If I run with a local session, i.e.
session = LocalPipelineSession(default_bucket=default_bucket)
With instance_type="local" / "local_gpu"
it works fine, but the model artifacts either aren’t uploaded to s3, or are being uploaded to a location I cannot find.
The last steps of training are
gi2m2zn2lo-algo-1-1fq0s | 2024-06-30 00:36:13,186 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code.
gi2m2zn2lo-algo-1-1fq0s | 2024-06-30 00:36:13,186 sagemaker-training-toolkit INFO Done waiting for a return code. Received 0 from exiting process.
gi2m2zn2lo-algo-1-1fq0s | 2024-06-30 00:36:13,186 sagemaker-training-toolkit INFO Reporting training SUCCESS
INFO:root:creating /tmp/tmpwj08bwki/artifacts/output/data
INFO:root:copying /tmp/tmpwj08bwki/algo-1-1fq0s/output/success -> /tmp/tmpwj08bwki/artifacts/output
INFO:root:copying /tmp/tmpwj08bwki/model/final-model.pt -> /tmp/tmpwj08bwki/artifacts/model
gi2m2zn2lo-algo-1-1fq0s exited with code 0
And the log also contains the docker-compose.yaml file, so I can see that at one stage local volumes were used, i.e.
volumes:
- /tmp/tmpwj08bwki/algo-1-1fq0s/input:/opt/ml/input
- /tmp/tmpwj08bwki/algo-1-1fq0s/output/data:/opt/ml/output/data
- /tmp/tmpwj08bwki/algo-1-1fq0s/output:/opt/ml/output
- /tmp/tmpwj08bwki/model:/opt/ml/model
- /tmp/tmpk1mymf_t:/opt/ml/input/data/pretrained_model
- /tmp/tmp0otdsjaf:/opt/ml/input/data/train
- /tmp/tmpvv7ptjnb:/opt/ml/input/data/dev
- /tmp/tmp5hmqgof0:/opt/ml/input/data/test
But being temporary, they appear to be removed after the local pipeline execution completes, and my assumption is local mode skips the use of s3 for artifact storage?
Is there any way of accessing these artifacts – I’d like to perform integration testing to ensure that the “model.tar.gz” contains the correct files etc.