I have a simple wordpress site defined by the ReplicationController
and Service
below. Once the app is deployed and running happily, I enabled autoscaling on the instance group created by Kubernetes by going to the GCE console and enabling autoscaling with the same settings (max 5, cpu 10).
Autoscaling the instances and the pods seem to work decent enough except that they keep going out of sync with each other. The RC autoscaling removes the pods from the CE instances but nothing happens with the instances so they start failing requests until the LB health check fails and removes them.
My process is as follows:
Create the cluster
$ gcloud container clusters create wordpress -z us-central1-c -m f1-micro
Create the rc
$ kubectl create -f rc.yml
Create the service
$ kubectl create -f service.yml
Autoscale the rc
$ kubectl autoscale rc frontend --max 5 --cpu-percent=10
Then I enabled the autoscaling in the console and gave the servers load to make them scale.
apiVersion: v1
kind: ReplicationController
name: frontend
replicas: 1
app: wordpress
- image: custom-wordpress-image
name: wordpress
- containerPort: 80
hostPort: 80
apiVersion: v1
kind: Service
name: frontend
name: frontend
type: LoadBalancer
- port: 80
targetPort: 80
protocol: TCP
name: wordpress
Update for more information
If I don't use kubernetes autoscaler and instead set the replicas to the same number as the instance group autoscaler max instance count, I seem to get the desired result. As instances are added to the instance group, kubernetes provisions them, as they are removed kubernetes updates accordingly. At this point I wonder what the purpose of the kubernets autoscaler is for.
From what I understand, the kubernetes autoscaler functionality is primarily intended to be used in the case that RCs & deployments have cpu limits defined for their children. You can define an autoscaler with a min and max number of pods and a target CPU usage for the pods, and it will scale pods across the cluster based on those limits regardless of cluster size. If the pods have no limits then you would likely want to scale the cluster and schedule an additional pod per additional node, though I'm not sure that aligns with best practices for containerized services because any node in a cluster running an unlimited pod can be dominated which might adversely limit other pods' ability to run. It's comparatively unpredictable.
In your usecase kubernetes is only giving you overhead. You are running 1 pod (docker container) on each instance in your instance group. You could also have your Docker container be deployed to App Engine flexible (former Managed VM's) https://cloud.google.com/appengine/docs/flexible/custom-runtimes/ and let the autoscaling of your instance group handle it.
It is not possible (yet) to link the instance scaling to the pod scaling in k8s. This is because they are two separate problems. The HPA of k8s is meant to have (small) pods scale to spread load over your cluster (big machines) so they will be scaling because of increased load.
If you do not define any limits (1 pod per machine) you could set the max amount of pods to the max scaling of your cluster effectively setting all these pods in a pending
state until another instance spins up.
If you want your pods to let your nodes scale then the best way (we found out) is to have them 'overcrowd' an instance so the instance-group scaling will kick in. We did this by setting pretty low memory/cpu requirements for our pods and high limits, effectively allowing them to burst over the total available CPU/memory of the instance.
cpu: 400m
memory: 100Mi
cpu: 1000m
memory: 1000Mi
With the new addition of Kubernetes 1.3 autoscaling I can now have Kubernetes autoscale my cluster and my pods.
Using GCP's create command I can now easily add an autoscaled cluster using the --enable-autoscaling
combined with the --min-nodes
, --max-nodes
, and --num-nodes