Here is my AML pipeline using python SDK v2.
input_path = Input(type="uri_folder", path="azureml://datastores/test/paths/input")
output_path = Output(type="uri_folder", path="azureml://datastores/test/paths/output")
@command_component(
name="test_com",
version="1",
display_name="test com",
description="test",
environment=custom_env,
)
def prepare_data_component(
input_data: Input(type="uri_folder"),
output_data: Output(type="uri_folder"),
):
print("input_data: ", input_data)
print("output_data: ", output_data)
@pipeline(
default_compute=cpu_compute_target,
)
def pipeline_test(pipeline_input_data):
prepare_data_node = prepare_data_component(
input_data=pipeline_input_data,
output_data=output_data,
)
return {
"output_data": prepare_data_node.outputs.output_data,
}
# create a pipeline
pipeline_job = pipeline_test(
pipeline_input_data=input_data,
)
print(pipeline_job)
But it shows me this error:
[component] test com() got an unexpected keyword argument 'output_data', valid keywords: 'input_data'.
Can output be passed as a parameter to component? Or can you give me some references on how to use output?
I don’t want the official documentation because it is missing a lot of content, such as “How to use output in python sdk”
1
Instead of giving outputs as parameter to the component you assign it after creating the component node.
Check below code.
Command component.
from pathlib import Path
from random import randint
from uuid import uuid4
from mldesigner import command_component, Input, Output
@command_component(
name="test_com",
version="1",
display_name="test com",
description="test"
)
def prepare_data_component(
input_data: Input(type="uri_folder"),
output_data: Output(type="uri_folder"),
):
print("input_data: ", input_data)
print("output_data: ", output_data)
model = str(uuid4())
(Path(output_data) / "output").write_text(model)
Here, i am writing a random string to output
file.
Next, create pipeline
@pipeline(
default_compute="cpu-cluster",
)
def pipeline_test(pipeline_input_data):
prepare_data_node = prepare_data_component(
input_data=pipeline_input_data
)
prepare_data_node.outputs.output_data = Output(type="uri_folder", path="azureml://datastores/jgsblob/paths/text/")
return {
"output": prepare_data_node.outputs.output_data,
}
pipeline_job = pipeline_test(
pipeline_input_data=input_path
)
After you are creating prepare_data_node
node, if you don’t give the outputs, by default, it will take
default datastore in your workspace.
If you want custom path, you assign that path
prepare_data_node.outputs.output_data = Output(type="uri_folder", path="azureml://datastores/jgsblob/paths/text/")
My datastore is jgsblob
which my blob storage and folder is text
.
Next, run the job.
pipeline_job = ml_client.jobs.create_or_update(
pipeline_job, experiment_name="pipeline_samples"
)
pipeline_job
You will get output .
Node data outputs preview.
and in blob storage
Same way for pipeline output give like below.
pipeline_job.outputs.output = Output(
type="uri_folder", mode="rw_mount", path=<custom_path>
)