I am trying to use parquet as the data format in an AWS Sagemaker.
But when trying to use parquet as the dataformat for AWS ClarifyCheckStep
in a pipeline, I will get ClientError: Content type is missing when using Parquet dataset. Please specify the content type
sample code looks like this
from sagemaker.clarify import DataConfig
from sagemaker.workflow.clarify_check_step import (
ClarifyCheckStep,
ModelExplainabilityCheckConfig,
SHAPConfig
)
output_model_explainability = "s3://foo/bar"
model_explainability_analysis_cfg_output_path = "s3://foo/baz"
config_model_explainability_data = DataConfig(
s3_data_input_path=step_preprocess.properties.ProcessingOutputConfig.Outputs[
"train"
].S3Output.S3Uri,
s3_output_path=output_model_explainability,
s3_analysis_config_output_path=model_explainability_analysis_cfg_output_path,
label=0,
dataset_type="application/x-parquet",
)
config_shap = SHAPConfig(seed=42, num_samples=10)
config_check_model_explainability = ModelExplainabilityCheckConfig(
data_config=config_model_explainability_data,
model_config=config_model,
explainability_config=config_shap,
)
step_check_model_explainability = ClarifyCheckStep(
name="ModelExplainabilityCheckStep",
clarify_check_config=config_check_model_explainability,
check_job_config=config_check_job,
skip_check=True,
register_new_baseline=True,
supplied_baseline_constraints=True,
model_package_group_name="packagename",
)
New contributor
padric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.