VMs running kubernetes clusters go down periodically

11/10/2018

We are running several kubernetes clusters on a few hundred VMs. A few VMs go down every week. We bring it back up. Our metrics show that the CPU & memory usage are low to moderate on these VMs when they go down. Other VM metrics (like the network traffic) also don't point to any unusual patterns. There are no specific messages in /var/log/messages when the VMs go down.

Kubernetes version: 1.9 Linux kernel version: 4.1.12-124.19.5.el7uek.x86_64

Are there other logs or diagnostic information we can check to get to the root cause of the VM outages.

-- sengs
kubernetes
virtual-machine

1 Answer

11/11/2018

Usually we also check the host journal especially if you are running kubelet as systemd.
There is a good tutorial on digitalocean explaining journald.

https://www.digitalocean.com/community/tutorials/how-to-use-journalctl-to-view-and-manipulate-systemd-logs

It might give you some clue as to why your kube nodes are crashing.

-- Bal Chua
Source: StackOverflow