I am working on a project involving Step Functions with SageMaker. I have an existing Step Function that I need to integrate SageMaker into, and I tried adding steps such as processing, model training, registering the model, and batch transform job requests. I also added .sync at the end of each resource so it waits for one to complete before starting the next.
However, I encountered an issue with the Step Function’s SageMaker processing job. The processing job runs but does not finish due to permission being denied to save a CSV file from my processed pandas dataframe.
# dependencies imports
df = pd.read_csv("/opt/ml/processing/input/data/data.csv")
print(df.head())
# some processing on df
df.to_csv("/opt/ml/processing/output/result.csv", index=False)
Here are my state machine configurations for the processing request:
Please leave me a comment, if you have to see other parts of my configs
{
"AppSpecification": {
"ContainerEntryPoint": [
"python3",
"/opt/ml/processing/input/code/processing.py"
]
},
"ProcessingInputs": [
{
"InputName": "Input-1",
"S3Input": {
"S3Uri": "s3://my-dataset/data.csv",
"LocalPath": "/opt/ml/processing/input/data"
}
},
{
"InputName": "Input-2",
"S3Input": {
"S3Uri": "s3://my-dataset/processing.py",
"LocalPath": "/opt/ml/processing/input/code"
}
}
],
"ProcessingOutputConfig": {
"Outputs": [
{
"OutputName": "Output-1",
"S3Output": {
"S3Uri": "s3://my-dataset/data.csv",
"LocalPath": "/opt/ml/processing/output/",
"S3UploadMode": "EndOfJob"
}
}
]
}
}
The ProcessingInputs configurations are working as expected. I saw in the log that data.csv content is correctly printed in the log by df.head()
. However, when it reaches the last line of code, I get the following error:
PermissionError: Permission Denied '/opt/ml/processing/output/result.csv'
I also tried saving it to other folders as I saw in some examples found online, such as saving to folders like training, result
, and others, but no luck so far. It’s giving me the same permission error. I used a Lambda function created just for this and made a request to the SageMaker processing job, and got exactly the same permission denied error.
I also tried saving into completely different folder out of /opt/ml/processing/, but /result.csv
But it gave me different error as SageMaker only allows us to save csv files under /opt/ml/processing/…. so I am not sure what to do with it.
Currently I am saving the result set manually using boto3 api and wait the processing job to pass the StoppingCondition.MaxRuntimeInSeconds
time I set and eventually it stops and I use additinoal step to pick it up.
But I dislike the way I make a workaround to the problem and I really need to find a better way to resolve this.
Can someone tell me what I am missing?