Yesterday I recreated a cluster so that it has version 1.1.7 for both master and nodes. After deploying a first service there it's no longer operational as it should be.
I can't ssh into the nodes. Deployments fail with a FailedScheduling error. The Kube UI fails with the following response.
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service \"kube-ui\"",
"reason": "ServiceUnavailable",
"code": 503
}
Resetting the nodes doesn't help here. Any ideas to what could cause this?
For anyone wondering what was the cause of the issue, we added more VMs to the cluster and setup resource request/limit on each pod to prevent the whole cluster to run out of resources. This seems to solve it. Alex, thanks again for your help.
It sounds like the cluster's nodes are all unhealthy. That would explain there being no kube-ui pod running, and the scheduling errors. Not being able to SSH into them is incredibly strange.
What does kubectl get nodes
and kubectl get node NODENAME -o yaml
(swapping out NODENAME for one of the names of the nodes) return?