Kubernetes: Node NotReady

10/6/2020

I'm new to this forum as well as I'm quite new to Kubernetes. However, I'm having a problem with a GKE cluster - The status of one node is switching to NotReady a lot. It has probably been happening at least once a day for the last two weeks. And the big problem is that it happens (my website goes down) during day time when I really need it work. When I restart it everything will go back to normal again but that usually takes 20 minutes and I don't have the time (or will) to do that everyday.

When looking at the logs for node, the pattern I can see I that these three messages will always appear in when the node changes its status to NotReady:

2020-10-06T07:58:03.782923Z curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received
2020-10-06T07:58:03.782923Z Kubelet is unhealthy!
2020-10-06T07:58:21Z Node gke-cluster-default-pool-d02df301-cyfr status is now: NodeNotReady

Does anyone have the slightest idea of what I can do to fix or at least troubleshoot this?

Best regards, Eric

-- erijo999
google-kubernetes-engine
kubelet
kubernetes

2 Answers

10/7/2020

Node NotReady can happen for couple reasons, such as:

  • Out of memory : maybe reached threshold or beyond
  • Disk pressure : maybe reached threshold or beyond
  • Network problem: this is set by the network plugin

Please refer to this answer to debug. In addition to the above please also check kubectl get events --all-namespaces

With the little log you have provided ATM it seems there is some operation which is kubelet trying to perform but can't therefore, it is setting the NotReady status.

Please gather more logs and post to the question which will help to figure out what operation kubelet is failing to perform. if it happens to be a wordpress application (hosted on kubernetes) problem then this link may help

-- garlicFrancium
Source: StackOverflow

10/8/2020

Thank you all for your advice.

I spoke to a friend who's got a bit more knowledge in the world of Kubernetes. He advised me to get a more powerful VM so I upgraded from one with 1.7 GB of memory to one with 3.5 GB. So far since the upgrade, I haven't experienced the Note NotReady problem which feels great.

I think an option would be to maybe limit the resources the services of the cluster in order to make sure that the Kubelet always has the resources it needs. Then it would maybe be possible to go back to a cheaper, less powerful machine type.

/Eric

-- erijo999
Source: StackOverflow