Why does kubernetes produce multiple errors when CPU usage is high?

7/16/2017

I'm using Kubernetes with GKE (one node), which is very nice to use. However, I'm experiencing multiple errors, making all the pods not responding :

  • kubectl exec command : Error from server: error dialing backend: ssh: rejected: connect failed (Connection refused)
  • logs from nginx-ingress controller : service staging/myservice does not have any active endpoints
  • kubectl top nodes : Error from server (InternalError): an error on the server ("unknown") has prevented the request from succeeding (get services http:heapster:)

It happens when CPU usage is high (100% or almost, due to parallel Jenkins builds in my case).

I do set some resource requests and limits (sometimes both) for few pods, but even those pods are not reachable and at one point, they restart. The reason is almost always "Completed", with exit code 0 and few times "Error" with different exit codes (2, 137, 255 for example).

I've also noticed this error from replication controllers : Error syncing pod, skipping: network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]

Kubernetes normally allows to keep availability of services over the cluster.

How can we explain this behavior ? What's the recommended way to prevent it ?

-- Neko
kubectl
kubelet
kubernetes

0 Answers