Drain GKE and shut down underlying Compute Engine - how to do

3/8/2020

We followed this Google tutorial. At step 7 we have the app up and running and see the "Hello World" when going to URL: http://:8080/

What we want:

  1. Drain GKE
  2. Stop underlying VM

Why: We have a TEST & DEV environment which should not run 24x7, we want to save money.

What we tried:

  1. Get nodes:
    kubectl get nodes shows the below:
gke-friday-test-default-pool-6478f7c8-4x30   Ready    <none>   13m   v1.14.10-gke.17
gke-friday-test-default-pool-6478f7c8-j9vz   Ready    <none>   13m   v1.14.10-gke.17
gke-friday-test-default-pool-6478f7c8-lhfc   Ready    <none>   13m   v1.14.10-gke.17
  1. Drain GKE
$ kubectl drain gke-friday-test-default-pool-6478f7c8-4x30 --ignore-daemonsets
$ kubectl drain gke-friday-test-default-pool-6478f7c8-j9vz --ignore-daemonsets
$ kubectl drain gke-friday-test-default-pool-6478f7c8-lhfc --ignore-daemonsets
  1. Verify pods are drained
$ kubectl get pods
NAME                          READY    STATUS    RESTARTS     AGE
hello-app-586d849658-77xbn    0/1      Pending   0            2m7s
hello-app-586d849658-dr4vx    0/1      Pending   0            2m7s
hello-app-586d849658-jwl5d    0/1      Pending   0            2m7s
hello-app-586d849658-kwlvh    0/1      Pending   0            2m7s
  1. We believe that GKE is now cordoned / dormant; maybe not?

    Is there any other step to be done? We hit URL again http://:8080/. Instead of "hello world" we get: ERR_EMPTY_RESPONSE

    I like to point out that we have a load balancer in the tutorial, we don't touch it.

  2. In Console we see, under Node details, CPU, Memory, Disk is not readable, shows no graph. Looks like it does not consume any resources.

  3. We shut down the VMs / Compute Engines; tutorial has 3 nodes..

$ gcloud compute instances stop gke-friday-test-default-pool-6478f7c8-4x30     --async --zone=australia-southeast1-c  
$ gcloud compute instances stop gke-friday-test-default-pool-6478f7c8-j9vz     --async --zone=australia-southeast1-c  
$ gcloud compute instances stop gke-friday-test-default-pool-6478f7c8-lhfc     --async --zone=australia-southeast1-c  
  1. In console under Compute Engine / Instance groups, we see the three nodes and it says: "Instance is being recreated"

What did we miss, which part of the architecture is preventing the VM from being shut down? We saw that GKE is cordoned, so nothing consumed. Only thing I could imagine is interfering is the load balancer.

Any idea what prevents the VM to be shut down?

Thanks

-- iDeliver Project Manager
google-kubernetes-engine

1 Answer

3/8/2020

If you have a GKE cluster with a node pool with desired size of 3 nodes, then if you manually delete some nodes, GKE will recreate them to bring it back to the desired state. You should not manually delete individual node VMs. Instead, scale down the desired size of your node pool to 0. GKE will handle the orchestration of actually deleting the nodes. https://cloud.google.com/kubernetes-engine/docs/how-to/node-pools#resizing_a_node_pool

-- Amit Kumar Gupta
Source: StackOverflow