I'm creating a new EKS Kubernetes Cluster on AWS.
When I deploy my workloads (migrating from an existing cluster) Kubelet stopps posting node status and all worker nodes become "NotReady" within a minute.
I was assuming that a misconfiguration within my cluster should not make the nodes crash - but apperently it does.
Can a misconfiguration within my cluster really make the AWS EKS Worker Nodes "NotReady"? Are there some rules of thumb under what circumstances this can happen? CPU Load to high? Pods in kube-system crashing?
You can try kubectl describe node $BAD_NODE
or ssh into node and try sudo dmesg -T
and
try restating the kubelet on the node /etc/init.d/kubelet restart
Or
systemctl restart kubelet
Or delete node (drain first)
kubectl drain <node-name>
kubectl delete node <node-name>