Scaling deployments using gpu based on demand

6/18/2019

I am currently deploying GPU instances and scaling them with duty cycle. But it is not quite good metric.

We have a deployment which is using gpu's. And it exposes a rest api where other jobs / pods can use it to perform inference. How can I efficiently scale them using hpa's? Based on which metric?

Currently I am trying to use duty cycle metric but it is not very reliable.

Edit: No this question is not duplicate of Autoscaling based on GPU utilization?. I am searching for a kubernetes metric to set autoscaling on.

-- aeroith
autoscaling
gpu
kubernetes

0 Answers