Kubernetes pod crashing because network error

6/29/2019

it happens already twice this week, on pod description i get this

  Type     Reason           Age              From                                                   Message
  ----     ------           ----             ----                                                   -------
  Warning  NetworkNotReady  2m (x3 over 2m)  kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr  network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
  Normal   SandboxChanged   46s              kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Pod sandbox changed, it will be killed and re-created.

i wanna explain a bit more about what is happening, everything its working normal and all of the sudden im adding the nodes description next

 Type     Reason      Age   From                                                          Message
  ----     ------      ----  ----                                                          -------
  Warning  OOMKilling  44m   kernel-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Memory cgroup out of memory: Kill process 1560920 (runc:[2:INIT]) score 0 or sacrifice child
Killed process 1560920 (runc:[2:INIT]) total-vm:131144kB, anon-rss:2856kB, file-rss:5564kB, shmem-rss:0kB
  Warning  TaskHung                   31m                kernel-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr   INFO: task dockerd:1883293 blocked for more than 300 seconds.
  Normal   NodeAllocatableEnforced    30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientDisk      30m (x2 over 30m)  kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasSufficientDisk
  Normal   NodeHasSufficientMemory    30m (x2 over 30m)  kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure      30m (x2 over 30m)  kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID       30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeHasSufficientPID
  Warning  Rebooted                   30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr has been rebooted, boot id: ecd3db95-4bfc-4df5-85b3-70df05f6fb48
  Normal   Starting                   30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Starting kubelet.
  Normal   NodeNotReady               30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeNotReady
  Normal   NodeReady                  30m                kubelet, gke-iagree-cluster-1-main-pool-5632d628-wgzr          Node gke-iagree-cluster-1-main-pool-5632d628-wgzr status is now: NodeReady
  Normal   Starting                   29m                kube-proxy, gke-iagree-cluster-1-main-pool-5632d628-wgzr       Starting kube-proxy.
  Normal   FrequentKubeletRestart     25m                systemd-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Node condition FrequentKubeletRestart is now: False, reason: FrequentKubeletRestart
  Normal   CorruptDockerOverlay2      25m                docker-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr   Node condition CorruptDockerOverlay2 is now: False, reason: CorruptDockerOverlay2
  Normal   UnregisterNetDevice        25m                kernel-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr   Node condition FrequentUnregisterNetDevice is now: False, reason: UnregisterNetDevice
  Normal   FrequentDockerRestart      25m                systemd-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Node condition FrequentDockerRestart is now: False, reason: FrequentDockerRestart
  Normal   FrequentContainerdRestart  25m                systemd-monitor, gke-iagree-cluster-1-main-pool-5632d628-wgzr  Node condition FrequentContainerdRestart is now: False, reason: FrequentContainerdRestart
-- PaulMB
gcloud
kubernetes

2 Answers

7/3/2019

These errors may arise in 1.11.x on GKE due to this issue: gke-issue.

The issue can be resolved by upgrading GKE cluster and node to version 1.12.5-gke.5 or 1.12.7-gke.10.

-- MaggieO
Source: StackOverflow

6/29/2019

After seeing the error it seems like you are running out of IPs in your CNI. While setting up the kubenet CNI for networking you must have pass the CIDR range which decides the number of allocatable IP in the cluster for the pods.

I am not sure about kubenet, how it maps IP to a pod if uses its own virtual network you need to use the wider CIDR range, if is taking IP from the host network interface then you need to choose the machine with mode subnetwork interfaces(This is how AWS VPC CNI works).

-- Vaibhav Jain
Source: StackOverflow