Autoscaling automatically adds/removes instances to my instance group based on CPU utilization.
Is the same possible with GPU utilization instead?
You forgot to mention if it is for the Cluster Autoscaler or the Horizontal Pod Autoscaler. You also forgot to mention if it's in GKE or GCE.
1 - In GKE, there are two types of Autoscalers:
a - The Cluster Autoscaler which adds nodes when new nodes are required:
The Cluster Autoscaler scales the node-pool in the Cluster. In this situation, you have to spin-up node instances with GPU accelerators. You should use the Nvidia Tesla GPUs which are supported with GKE. To take advantage of the Cluster Autoscaler, it is recommended to create a separate GPU node pool in the Cluster. The GPU nodes will automatically scale down if there are not enough pods requesting GPUs, and scale up if there are too many pods requesting GPUs.
b - The Horizontal Pod Autoscaler which adds new pods when new pods are required:
This Horizontal Pod Autoscaler (HPA) scales the number of pod replicas. The HPA uses the resource metrics API to collect metrics. With the HPA, you can autoscale the pods based on custom metrics, and metrics available in Stackdriver. You may choose the GPU as one of the metrics. For a step-by-step tutorial, you may consult this StackOverflow thread.
2 - In GCE, the autoscaler is based on the Autoscaling policy. The Autoscaling policies are:
This means that you could add a policy that would autoscale the instances based on Stackdriver Monitoring metrics. There is no default Stackdriver monitoring metric for GPUs, but you can create a custom metric that would monitor GPU utilization. The next step would be autoscale the instance group based on that custom metric.
This only works on managed instance groups, and the policy would be based on a newly created custom custom metric that would monitor GPU utilization. I also found an interesting article on how to create a custom metric in Stackdriver based on GPU utilization.