According to Kubernetes documentation,
If you are using GCE, you can configure your cluster so that the number of nodes will be automatically scaled based on:
- CPU and memory utilization.
- Amount of of CPU and memory requested by the pods (called also reservation).
Is this actually true?
I am running mainly Jobs on my cluster, and would like to spin up new instances to service them on demand. CPU usage doesn't work well as a scaling metric for this workload.
From Google's CKE documentation, however, this only appears to be possible by using Cloud Monitoring metrics -- relying on a third-party service that you then have to customize. This seems like a perplexing gap in basic functionality that Kubernetes itself claims to support.
Is there any simpler way to achieve the very simple goal of having the GCE instance group autoscale based on the CPU requirements that I'm quite explictly specifying in my GKE Jobs?
The disclaimer at the bottom of that section explains why it won't work by default in GKE:
Note that autoscaling will work properly only if node metrics are accessible in Google Cloud Monitoring. To make the metrics accessible, you need to create your cluster with KUBE_ENABLE_CLUSTER_MONITORING equal to google or googleinfluxdb (googleinfluxdb is the default value). Please also make sure that you have Google Cloud Monitoring API enabled in Google Developer Console.
You might be able to get it working by standing up a heapster instance in your cluster configured with --sink=gcm
(like this), but I think it was more of an older proof of concept than a well-maintained, production-grade configuration.