Wait for canal pod before scheduling pods on new node

7/12/2017

When creating/adding a node to kubernetes, we also have to create a Canal pod.

Currently, kubernetes does not wait for the Canal pod to be ready before trying to schedule pods, resulting in failures (error below)

Error syncing pod, skipping: failed to "CreatePodSandbox" for "nginx-2883150634-fh5s2_default(385d61d6-6662-11e7-8989-000d3af349de)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nginx-2883150634-fh5s2_default(385d61d6-6662-11e7-8989-000d3af349de)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"nginx-2883150634-fh5s2_default\" network: failed to find plugin \"loopback\" in path [/opt/loopback/bin /opt/cni/bin]"

Once the Canal pod is up-and-running, simply deleting the failing pod(s) will fix the issue.

My question is: what would be the right way to tell kubernetes to wait for the network pod to be ready before trying to schedule pods on the node?

  • Should I taint the node to only allow Canal, and untaint once it is ready?
  • Should I script the deleting of failed pods once Canal is ready?
  • Is there a configuration or a way to do it that eliminate the issue?
-- Mia
kubernetes

1 Answer

7/13/2017

This is common issue, so I'll post the answer anyway.

The behaviour is normal especially in a self-hosted k8s cluster. In a self-hosted environment, all deployments incl the control plane elements (eg. kube-apiserver, canal) are scheduled at the same time.

The failed pods should eventually start properly once the control plane is running. k8s will keep restarting failed pods until it comes up properly.

To make Canal start first, the manifest can be deployed in the k8s node together with the other control plane manifests (eg. kube-apiserver, kube-controller-manager). It's usually found in /etc/kubernetes/manifests but the path is completely arbitrary. However, if Canal takes too long to be ready, the same error will appear.

-- Eugene Chow
Source: StackOverflow