Container engine cluster version 1.1.7 nodes unavailable

2/10/2016

Yesterday I recreated a cluster so that it has version 1.1.7 for both master and nodes. After deploying a first service there it's no longer operational as it should be.

I can't ssh into the nodes. Deployments fail with a FailedScheduling error. The Kube UI fails with the following response.

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "no endpoints available for service \"kube-ui\"",
  "reason": "ServiceUnavailable",
  "code": 503
}

Resetting the nodes doesn't help here. Any ideas to what could cause this?

-- Jorrit Salverda
google-kubernetes-engine
kubernetes

2 Answers

2/16/2016

For anyone wondering what was the cause of the issue, we added more VMs to the cluster and setup resource request/limit on each pod to prevent the whole cluster to run out of resources. This seems to solve it. Alex, thanks again for your help.

-- Etienne Tremel
Source: StackOverflow

2/10/2016

It sounds like the cluster's nodes are all unhealthy. That would explain there being no kube-ui pod running, and the scheduling errors. Not being able to SSH into them is incredibly strange.

What does kubectl get nodes and kubectl get node NODENAME -o yaml (swapping out NODENAME for one of the names of the nodes) return?

-- Alex Robinson
Source: StackOverflow