I’ve deployed a Asynchronous SageMaker Endpoint and I want it to scale out (to 0 instances) when nothing is requested for a period of times and to scale in when something is requested (to <=1 instances)
I’ve followed some post online and I created the scaling policy like this:
self.scaling_policies = self.client_autoscaling.put_scaling_policy(
PolicyName=self.policy_name,
ServiceNamespace="sagemaker",
ResourceId=self.resource_id,
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
PolicyType="StepScaling",
StepScalingPolicyConfiguration={
"AdjustmentType": "ChangeInCapacity",
"MetricAggregationType": "Average",
"Cooldown": 60,
"StepAdjustments":
[
{
"MetricIntervalLowerBound": 0,
"ScalingAdjustment": 1
}
]
},
)
response = self.client_cloudwatch.put_metric_alarm(
AlarmName=self.policy_name,
MetricName='HasBacklogWithoutCapacity',
Namespace='AWS/SageMaker',
Statistic='Average',
EvaluationPeriods= self.evaluation_periods,
DatapointsToAlarm= self.datapoints,
Threshold=self.target_value,
ComparisonOperator='GreaterThanOrEqualToThreshold',
TreatMissingData='breaching',
Dimensions=[
{ 'Name':'EndpointName', 'Value': self.endpoint_name},
],
Period=self.period,
AlarmActions=[self.scaling_policies['PolicyARN']],
)
The endpoint succesfully creates and I see that there is a scaling policy attached to it. I’ve also see that an alarm is triggered but it doesn’t scale out (it is deployed with an initial instance count of 1 and I expect to quickly go to 0)
Any help?
I tried to create a Asynchronous SageMaker Endpoint with a scaling policy to scale in an scale out following this posts:
https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-autoscale.html
How to Quickly Scale SageMaker Async Endpoint from 0 to 1 Instance for a Single Request?
https://medium.com/@neethu.v.gopal/asynchronous-endpoints-for-stable-diffusion-in-aws-using-sagemaker-with-autoscaling-b0db4206648b