GKE nodes unexpectedly deleted and recreated

4/12/2019

I created a cluster on Google Kubernetes Engine. The nodes get deleted/created very often (at least once a day). Even though new instances are created to replace them, and pods are moved to these new nodes, I would like to understand why the nodes disappear.

I checked the settings used to create the cluster and the node pool:

  • "Automatic node upgrade" is Disabled on the node pool.
  • "Pre-emptible nodes" is Disabled.
  • "Automatic node repair" is Enabled, but I doesn't look like there was a node repair, since I don't see anything in gcloud container operations list at the time when my nodes were deleted.

I can see that the current nodes were all (re-)created at 21:00, while the cluster was created at 08:35 :

➜  ~ gcloud container clusters describe my-cluster --format=json
{
  "createTime": "2019-04-11T08:35:39+00:00",
  ...
  "nodePools": [
    {
      ...
      "management": {
        "autoRepair": true
      },
      "name": "default-pool",
    }
  ],
  "status": "RUNNING",
  ...
}

How can I trace the reason why the nodes were deleted ?

-- TagadaPoe
google-kubernetes-engine
kubernetes

2 Answers

10/18/2019

just happened to me on Sunday 13/10/2019. all data from stateful partition also gone

-- Wojtas.Zet
Source: StackOverflow

4/12/2019

I tried to reproduce your problem by creating a cluster, manually stopping the kubelet on a node (by running systemctl stop kubelet) to trigger repair and watching the node recover. In my case, I do see an operation for the auto node repair, but I can also see in the GCE operations log that the VM was deleted and recreated (by the GKE robot account).

If you run gcloud compute operations list (or check the cloud console page for operations) you should see what caused the VM to be deleted and recreated.

-- Robert Bailey
Source: StackOverflow