Kubernetes breaks after OOM

5/25/2017

I faced the issue with Kubernetes after OOM on the master node. Kubernetes services were looking Ok, there were not any error or warning messages in the log. But Kubernetes failed to process new deployment, wich was created after OOM happened.

I reloaded Kubernetes by systemctl restart kube-*. And it solved the issue, Kubernetes began work normally.

I just wonder is it expected behavior or bug in Kubernetes?

-- Pavel Prischepa
kubernetes

2 Answers

5/30/2017

It seems the problem wasn't caused by OOM. It was caused by kube-controller regardless to was OOM happen or not.

If I restart kube-controller Kubernetes begins process deployments and pods normally.

-- Pavel Prischepa
Source: StackOverflow

7/14/2017

It would be great if you can share kube-controller's log. But when api server crash / OOMKilled, there can be potential synchronization problems in early version of kubernetes (i remember we saw similar problems with daemonset and I have bug filed to Kubernete community), but rare.

Meanwhile, we did a lot of work to make kubernetes production ready: both tuning kubernetes and crafting other micro-services that need to talk to kubernetes. Hope these blog entries would help:

https://applatix.com/making-kubernetes-production-ready-part-2/ This is about 30+ knobs we used to tune kubernetes

https://applatix.com/making-kubernetes-production-ready-part-3/ This is about micro service behavior to ensure cluster stability

-- Hao Zhang
Source: StackOverflow