1 minute Service timeout for AMLS models deployed on ACI or AKS

9/23/2019

We have created an image scoring model on Machine learning Service and deployed using AMLS portal on ACI and AKS both. Though it runs on smaller images , for larger images it gets timed-out after exactly 1 minute on both ACI and AKS. It is expected that an image scoring can take few minutes.

Wanted to know , if it’s a limitation on using AMLS deployment, or on ACI and AKS that they timeout the deployed webservice after 60 seconds?? Any workaround would be welcomed

ACI Error :- Post http://localhost:5001/score: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

AKS Error :- Replica closed connection before replying

-- Prashant Kumar
azure
azure-container-instances
azure-kubernetes
azure-machine-learning-service

2 Answers

9/23/2019

The deployment class has a timeout setting you can change in the constructor, that can help. Some clients will time out anyways.

https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice.aks.aksservicedeploymentconfiguration?view=azure-ml-py

scoring_timeout_ms : int => A timeout to enforce for scoring calls to this Webservice. Defaults to 60000

-- Greg
Source: StackOverflow

10/2/2019

If you are deploying a service in AKS, then @Greg's solution should be sufficient for most cases. However, if your value for scoring_timeout_ms is going to exceed 60000 milliseconds (i.e. 60 secs), then I recommend also tuning with the following config settings. When your model gets deployed in Kubernetes as a deployment, we define a LivenessProbe so that if your model container becomes unresponsive, Kubernetes can automatically restart your container in an effort to restore the health of your model.

  • period_seconds: the time interval between each LivenessProbe. If your model is going to take 45 seconds to respond to a scoring request, then 1 thing you can do is to increase the time interval between each LivenessProbe execution from the default 10 seconds to possibly 30 seconds (or more).
  • failure_threshold: the number of LivenessProbe failures after which Kubernetes restarts your model container. If you want to run LivenessProbe every 10 seconds and your model is going to take 45 seconds to respond, then you can increase failure_threshold from default 3 to 10. This would mean after 10 consecutive LivenessProbe failures, Kubernetes will restart your container.
  • timeout_seconds: the time interval for LivenessProbe to wait before giving up. One other option you could consider is increasing the timeout_seconds from default 2 seconds to 30 seconds. This would result in LivenessProbe waiting for up to 30 seconds when your container is busy but when it is not, it will reply back earlier.

There is no one "correct" config setting to modify, but the combination of these will definitely help in preventing 502 "Replica closed connection before replying" error.

-- Parth Shah
Source: StackOverflow