I am trying to run a sagemaker pipeline with multiple data inputs in my data preprocessing script. When I run the pipeline with just one data input it runs, but when I introduce a second data input which I need to merge in the preprocessing step, I have receive two different kind of errors:
- ‘FailureReason’: ‘ClientError: AlgorithmError: , exit code: 1’,
- ‘FailureReason’: ‘ClientError: Failed to invoke sagemaker:CreateProcessingJob. Error Details: null’,
Note: I have been able to successfully run the pipeline with each data input individually, so I know the pipeline is able to the read the data in. There’s something about the way I am introducing the multiple paramters/data inputs that is not working.
This is what my parameters look like:
`from sagemaker.workflow.parameters import (
ParameterInteger,
ParameterString,)
Processing_instance_count =
ParameterInteger(name="ProcessingInstanceCount",default_value=1)
data_input1 = ParameterString(name="data1",default_value=data1,)
data_input2 = ParameterString(name="data2",default_value=data2,)`
My processing step:
process_step = ProcessingStep(
name="...",
processor=sklearn_processor,
inputs=[
ProcessingInput(source=data1, destination="/opt/ml/processing/input"),
ProcessingInput(source=data2, destination="/opt/ml/processing/input"),],
outputs=[
ProcessingOutput(output_name="...", source="/opt/ml/processing/output"),], code=preprocess_script_uri,
)
And then the pipeline definition:
pipeline_name = f"..."
pipeline = Pipeline(
name=pipeline_name,
parameters=[
processing_instance_count,
data_input1,
data_input2, ],
steps=[process_step],
)
anishabm is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
In your code where constructing ProcessingStep, you are specifying two ProcessingInputs and they have same destination path (“/opt/ml/processing/input”). Seeing the ml-ops sample notebooks in the amazon-sagemaker-examples
repo, they use different destination paths when using multiple ProcessingInputs. Please try specifying different paths and check if the issue resolves.