Kubernetes Horizontal Pod Autoscaler on GKE - "failed to get CPU utilization"

9/11/2016

I am fairly new to Kubernetes and GKE (Google Container Engine) as a whole, so I was playing with the horizontal pod autoscaling and cluster autoscaling features by hitting my load balancer hard enough to make it scale up enough pods that it needed more instances, so it scaled those up but then it got to the point that there were some pods in Pending state, but it had also reached the max number of instances for the autoscaling cluster, so they were left in Pending state.

I then stopped the load test hoping it would come down on its own, but it wouldn't. I looked at kubectl describe hpa and I would see errors like:

7m            18s             18      {horizontal-pod-autoscaler }                    Warning         FailedGetMetrics      failed to get CPU consumption and request: metrics obtained for 4/5 of pods
7m            18s             18      {horizontal-pod-autoscaler }                    Warning         FailedComputeReplicas failed to get CPU utilization: failed to get CPU consumption and request: metrics obtained for 4/5 of pods

There are actually only 4 pods running (and none in pending state), and looking at the heapster logs (kubectl logs -f heapster-v1.1.0-<id> --namespace=kube-system heapster) I can see it is actually looking for metrics in a pod that doesn't exist anymore (this would be the mystery 5th pod it's complaining about).

The issue with this is that because it is missing the 5th pod, it can't finish getting the current CPU utilization for the 4 pods that are running, and thus horizontal pod autoscaling doesn't work.

Any ideas how to get out of a situation like this?

I've tried removing the hpa and creating it again, but it didn't help.

-- Gman
google-kubernetes-engine
heapster
kubernetes

0 Answers