kubectl get nodes hangs when I delete a node externally

5/28/2019

Been experimenting with Kubernetes/Rancher and encountered some unexpected behavior. Today I'm deliberately putting on my chaos monkey hat and learning how things behave when stuff fails.

Here's what I've done:

1) Using the Rancher UI, stand up a 3 node cluster on Digital Ocean DO 3 node cluster Success -- a few mins later I have a 3 node cluster, visible in Rancher.

2) Using the Rancher UI, I deleted a node in a 'happy' scenario where I push the appropriate node delete button using Rancher.

Some minutes later, I have a 2 node cluster. Great. 2 node cluster

3) Using the Digital Ocean admin UI, I delete a node in an 'oops' scenario as if a sysadmin accidentally deleted a node. delete from DO admin UI

Back on the ranch (sorry), I click here to view the state of the cluster: View cluster

Unfortunately after three minutes, I'm getting a gateway timeout Gateway timeout

Detailed timeouts in Chrome network inspector timeouts

Here's what kubectl says:

$ kubectl get nodes
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)

So, question is, what happened here? I was under the impression Kubernetes was 'self healing' and even if this node I deleted was the etcd leader, it would eventually recover. Been around 2 hours -- do I just need to wait more?

-- paws
digital-ocean
kubernetes
rancher

0 Answers