networkPlugin cni failed to set up pod issue while Kubernetes deployment

1/18/2020

I am trying to deploy my sample Spring Boot micro service into Kubernetes cluster. My every nodes are showing ready state. And when I am trying to deploy, my pod is only showing ContainerCreating.

And when I am describing the pod, then I am getting the message by saying networkPlugin cni failed to set up pod and network unable to allocate IP address.

My pod describe command result like the following:

Events:
 Type     Reason                  Age                    From                   Message
  ----     ------                  ----                   ----                   -------
  Normal   Scheduled               <unknown>              default-scheduler      Successfully assigned 
default/spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj to mildevkub040
  Warning  FailedCreatePodSandBox  53m                    kubelet, mildevkub040  Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "2499f91b4a1173fb854a47ba1910d1fc3f18cfb35bf5c38c9a3008e19d385e15" network for pod "spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj": networkPlugin cni failed to set up pod "spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj_default" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/2499f91b4a1173fb854a47ba1910d1fc3f18cfb35bf5c38c9a3008e19d385e15: dial tcp 127.0.0.1:6784: connect: connection refused, failed to clean up sandbox container "2499f91b4a1173fb854a47ba1910d1fc3f18cfb35bf5c38c9a3008e19d385e15" network for pod "spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj": networkPlugin cni failed to teardown pod "spacestudysecurityauthcontrol-deployment-57596f4795-jxxvj_default" network: Delete http://127.0.0.1:6784/ip/2499f91b4a1173fb854a47ba1910d1fc3f18cfb35bf5c38c9a3008e19d385e15: dial tcp 127.0.0.1:6784: connect: connection refused]
  Normal   SandboxChanged          3m40s (x228 over 53m)  kubelet, mildevkub040  Pod sandbox changed, it will be killed and re-created.

When I am checking the container weave log I am getting like the following,

INFO: 2020/01/09 12:18:12.061328 ->[192.168.16.178:42838] connection shutting down due to error during handshake: write tcp 192.168.16.177:6783->192.168.16.178:42838: write: connection reset by peer
INFO: 2020/01/09 12:18:18.998360 ->[192.168.16.178:37570] connection accepted
INFO: 2020/01/09 12:18:20.653339 ->[192.168.16.178:45223] connection shutting down due to error during handshake: write tcp 192.168.16.177:6783->192.168.16.178:45223: write: connection reset by peer
INFO: 2020/01/09 12:18:21.122204 overlay_switch ->[56:60:12:a9:76:d1(mildevkub050)] using fastdp
INFO: 2020/01/09 12:18:21.742168 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection deleted
INFO: 2020/01/09 12:18:21.800670 ->[192.168.16.178:6783] attempting connection
INFO: 2020/01/09 12:18:22.470207 ->[192.168.16.175:59923] connection accepted
INFO: 2020/01/09 12:18:22.912690 ->[192.168.16.175:6783|be:b1:3f:a4:34:88(mildevkub020)]: connection deleted
INFO: 2020/01/09 12:18:22.918075 Removed unreachable peer be:b1:3f:a4:34:88(mildevkub020)
INFO: 2020/01/09 12:18:22.918144 Removed unreachable peer 56:60:12:a9:76:d1(mildevkub050)
INFO: 2020/01/09 12:18:24.602093 ->[192.168.16.175:6783] attempting connection
INFO: 2020/01/09 12:18:26.782123 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection ready; using protocol version 2
INFO: 2020/01/09 12:18:27.918518 ->[192.168.16.175:59923|be:b1:3f:a4:34:88(mildevkub020)]: connection ready; using protocol version 2
INFO: 2020/01/09 12:18:29.365629 ->[192.168.16.178:37570|56:60:12:a9:76:d1(mildevkub050)]: connection ready; using protocol version 2
INFO: 2020/01/09 12:18:29.864370 overlay_switch ->[56:60:12:a9:76:d1(mildevkub050)] using fastdp
INFO: 2020/01/09 12:18:30.086645 overlay_switch ->[56:60:12:a9:76:d1(mildevkub050)] using fastdp
INFO: 2020/01/09 12:18:30.090275 overlay_switch ->[be:b1:3f:a4:34:88(mildevkub020)] using fastdp
INFO: 2020/01/09 12:18:30.100874 ->[192.168.16.178:37570|56:60:12:a9:76:d1(mildevkub050)]: connection added (new peer)
INFO: 2020/01/09 12:18:30.104237 ->[192.168.16.178:37570|56:60:12:a9:76:d1(mildevkub050)]: connection deleted
INFO: 2020/01/09 12:18:30.104284 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection added (new peer)
INFO: 2020/01/09 12:18:30.104371 ->[192.168.16.175:59923|be:b1:3f:a4:34:88(mildevkub020)]: connection added (new peer)
INFO: 2020/01/09 12:18:30.776275 ->[192.168.16.178:37570|56:60:12:a9:76:d1(mildevkub050)]: connection shutting down due to error: Multiple connections to 56:60:12:a9:76:d1(mildevkub050) added to 5a:67:92:b3:58:ce(mildevkub040)
INFO: 2020/01/09 12:18:44.305079 ->[192.168.16.175:6783|be:b1:3f:a4:34:88(mildevkub020)]: connection ready; using protocol version 2
INFO: 2020/01/09 12:18:45.200565 overlay_switch ->[be:b1:3f:a4:34:88(mildevkub020)] using fastdp
INFO: 2020/01/09 12:18:45.458203 ->[192.168.16.175:59923|be:b1:3f:a4:34:88(mildevkub020)]: connection fully established
INFO: 2020/01/09 12:18:45.461157 ->[192.168.16.175:6783|be:b1:3f:a4:34:88(mildevkub020)]: connection shutting down due to error: Multiple connections to be:b1:3f:a4:34:88(mildevkub020) added to 5a:67:92:b3:58:ce(mildevkub040)
INFO: 2020/01/09 12:18:45.470667 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection fully established
INFO: 2020/01/09 12:18:45.688871 sleeve ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: Effective MTU verified at 1438
INFO: 2020/01/09 12:18:45.874380 sleeve ->[192.168.16.175:6783|be:b1:3f:a4:34:88(mildevkub020)]: Effective MTU verified at 1438
INFO: 2020/01/09 12:24:12.026645 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection shutting down due to error: write tcp 192.168.16.177:38313->192.168.16.178:6783: write: connection reset by peer
INFO: 2020/01/09 12:25:56.708405 ->[192.168.16.178:44120] connection accepted
INFO: 2020/01/09 12:26:31.769826 overlay_switch ->[56:60:12:a9:76:d1(mildevkub050)] sleeve timed out waiting for UDP heartbeat
INFO: 2020/01/09 12:26:41.819554 ->[192.168.16.175:59923|be:b1:3f:a4:34:88(mildevkub020)]: connection shutting down due to error: write tcp 192.168.16.177:6783->192.168.16.175:59923: write: connection reset by peer
INFO: 2020/01/09 12:28:17.563133 ->[192.168.16.178:6783|56:60:12:a9:76:d1(mildevkub050)]: connection deleted
INFO: 2020/01/09 12:30:49.548347 ->[192.168.16.178:60937] connection accepted

When I am running the command kubectl exec -n kube-system weave-net-fj9mm -c weave -- /home/weave/weave --local status ipam , I am getting the response like "Error from server (NotFound): pods "weave-net-fj9mm" not found"

How I can resolve this issue?

-- Jacob
kubernetes
weave

1 Answer

2/15/2020

The url that is appearing in the pod describe command, if you curl it. You will get something like this.

# curl 'http://127.0.0.1:6784/status'
        Version: 1.8.2 (version 1.9.1 available - please upgrade!)

        Service: router
       Protocol: weave 1..2
           Name: 66:2b:6a:ca:34:88(ip-10-128-152-185)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 4
    Connections: 4 (3 established, 1 failed)
          Peers: 4 (with 12 established connections)
 TrustedSubnets: none

        Service: ipam
         Status: waiting for IP range grant from peers
          Range: 10.32.0.0/12
  DefaultSubnet: 10.32.0.0/12

"waiting for IP range grant from peers" status indicates that Weave Net's IPAM believes that all the IP address space is owned by other nodes in the cluster, but actually none of those nodes are able to be contacted at the moment.

Here's the workaround. Big red warnings:

  • All unreachable hosts were first identified as gone forever.
  • Do not run this on more than one node.
  • This may screw up your kubernetes cluster if something goes wrong.
  • There is a failsafe 'echo' added to the command in case you didn't read the above warnings.
% for i in $(curl -s 'http://127.0.0.1:6784/status/ipam' | grep 'unreachable\!
#x27; | sort -k2 -n -r | awk -F'(' '{print $2}' | sed 's/).*//'); do echo curl -X DELETE 127.0.0.1:6784/peer/$i; done
65536 IPs taken over from ip-10-128-184-15 32768 IPs taken over from ip-10-128-159-154 32768 IPs taken over from ip-10-128-170-84

Reference - https://github.com/weaveworks/weave/issues/2822

-- Devesh mehta
Source: StackOverflow