Power cycle of one of the worker nodes becomes useless since after restart the pods struck in “ContainerCreating” state

8/28/2019

Scenario:

One of the worker nodes goes down due to power cycle while the master is scheduling the pods between worker nodes.

Once the worker node comes up after power cycle, the master is able to schedule the remaining pods to worker node which came up.

However all the pods which are scheduled to the worker node are stuck in the "ContainerCreating" state for a long time which makes the worker node useless after the power cycle.

Cluster Details:

Docker Version: 18.06.1-ce

Kubernetes version: v1.14.0

helm version - v2.12.1

Host OS: Centos 7

Cloud being used: (put bare-metal if not on a public cloud)

Installation method: Ansible Script

Kubelet log:

Line 322: Jul 26 15:44:57 k8sworker3 kubelet[1832]: E0726 15:44:57.842527    1832 cni.go:331] Error adding logging_filebeat-kvdjg/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df to network weave-net/weave: unable to allocate IP address: Post http://127.0.0.1:6784/ip/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df: dial tcp 127.0.0.1:6784: connect: connection refused
Line 326: Jul 26 15:44:57 k8sworker3 kubelet[1832]: weave-cni: unable to release IP address: Delete http://127.0.0.1:6784/ip/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df: dial tcp 127.0.0.1:6784: connect: connection refused
Line 342: Jul 26 15:44:58 k8sworker3 kubelet[1832]: E0726 15:44:58.073865    1832 cni.go:331] Error adding vz1-db-backup_vz1-warrior-job-5d242b94c6ba2500011bfedc-1564172937569-pwpq2/a991d0c781d5c3ec6c2dca9753fc8a1a2958b762a75b3d619f3da3744c41d160 to network weave-net/weave: unable to allocate IP address: Post http://127.0.0.1:6784/ip/a991d0c781d5c3ec6c2dca9753fc8a1a2958b762a75b3d619f3da3744c41d160: dial tcp 127.0.0.1:6784: connect: connection refused
Line 349: Jul 26 15:44:58 k8sworker3 kubelet[1832]: E0726 15:44:58.093351    1832 remote_runtime.go:109] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df" network for pod "filebeat-kvdjg": NetworkPlugin cni failed to set up pod "filebeat-kvdjg_logging" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df: dial tcp 127.0.0.1:6784: connect: connection refused

Please suggest me on how to prevent this issue.

-- Bhavani Prasad
kubernetes

0 Answers