Inconsistent behaviour of Kubernetes slaves, few slaves don't show up

5/19/2016

I have a kubernetes master setup in AWS, balanced by a ELB. I create 5-6 instances using terraform and provision it to be kube slaves and point kubelets to the ELB. When I run kubectl get nodes, only 3 or 4 instances show up. Looks like slaves registration with master fails for few nodes, but all nodes are identical.

Its random behaviour, some times all slaves show up just fine.

-- Gautham V kidiyoor
containers
coreos
kubernetes
kubernetes-health-check

2 Answers

5/22/2016

Answering my own question -

I name slave nodes with their PrivateIP, and I dynamically spawn slaves, attach it to master, schedule pods, and destroy the slaves after the job is done but I never deleted these nodes from kube. ie: 'kubectl delete node-name'.

All these destroyed slave nodes were in 'Not ready' state with name=PrivateIP.

Now since the slaves are destroyed, the PrivateIP is returned to the AWS IP pool, newly spawned instances can now take those IP's.

Now when I spawn new slaves and try to attach it with master, its possible that few slaves get the same PrivateIP as those slaves that are in 'Not ready' state(since those slaves are destroyed and these IP's are released already).

Hence Kubernetes used to just change the status of the old slave to 'Ready' state, which went unnoticed earlier since I was programatically waiting for new slaves to show up.

Notice:

Destroy meaning terminate AWS instance

Delete meaning detaching the slave from Kubernetes ie. kubectl delete node-name

-- Gautham V kidiyoor
Source: StackOverflow

5/21/2016

This might be a race condition, from my own experience with AWS and Terraform.

ELBs usually require more time than EC2 instances to get ready, so if for any reason a kubelet starts before the ELB is able to serve, the node will just fail to register ("host not found" or "error 500" depending on the timing)

You can mitigate that in 2 manners:

  • make your kubelet service/container restart automatically on failure
  • create a strict dependency between EC2 instances and the ELB, with a readiness check on the ELB (HTTP call would suffice)

I would need the logs from the kubelet to validate that theory of course.

-- Antoine Cotten
Source: StackOverflow