Cannot get kube-dns to start on Kubernetes

2/3/2017

Hoping someone can help. I have a 3x node CoreOS cluster running Kubernetes. The nodes are as follows: 192.168.1.201 - Controller 192.168.1.202 - Worker Node 192.168.1.203 - Worker Node

The cluster is up and running, and I can run the following commands:

> kubectl get nodes

NAME            STATUS                     AGE
192.168.1.201   Ready,SchedulingDisabled   1d
192.168.1.202   Ready                      21h
192.168.1.203   Ready                      21h

> kubectl get pods --namespace=kube-system

NAME                                    READY     STATUS             RESTARTS   AGE
kube-apiserver-192.168.1.201            1/1       Running            2          1d
kube-controller-manager-192.168.1.201   1/1       Running            4          1d
kube-dns-v20-h4w7m                      2/3       CrashLoopBackOff   15         23m
kube-proxy-192.168.1.201                1/1       Running            2          1d
kube-proxy-192.168.1.202                1/1       Running            1          21h
kube-proxy-192.168.1.203                1/1       Running            1          21h
kube-scheduler-192.168.1.201            1/1       Running            4          1d

As you can see, the kube-dns service is not running correctly. It keeps restarting and I am struggling to understand why. Any help in debugging this would be greatly appreciated (or pointers at where to read about debugging this. Running kubectl logs does not bring anything back...not sure if the addons function differently to standard pods.

Running a kubectl describe pods, I can see the containers are killed due to being unhealthy:

16m           16m             1       {kubelet 192.168.1.203} spec.containers{kubedns}        Normal          Created         Created container with docker id 189afaa1eb0d; Security:[seccomp=unconfined]
16m           16m             1       {kubelet 192.168.1.203} spec.containers{kubedns}        Normal          Started         Started container with docker id 189afaa1eb0d
14m           14m             1       {kubelet 192.168.1.203} spec.containers{kubedns}        Normal          Killing         Killing container with docker id 189afaa1eb0d: pod "kube-dns-v20-h4w7m_kube-system(3a545c95-ea19-11e6-aa7c-52540021bfab)" container "kubedns" is unhealthy, it will be killed and re-created

Please find a full output of this command as a github gist here: https://gist.github.com/mehstg/0b8016f5398a8781c3ade8cf49c02680

Thanks in advance!

-- mehstg
coreos
dns
kubernetes

3 Answers

5/22/2017

After followed the steps in the official kubeadm doc with flannel networking, I run into a similar issue

http://janetkuo.github.io/docs/getting-started-guides/kubeadm/

It appears as networking pods get stuck in error states:

kube-dns-xxxxxxxx-xxxvn (rpc error)

kube-flannel-ds-xxxxx (CrashLoopBackOff)

kube-flannel-ds-xxxxx (CrashLoopBackOff)

kube-flannel-ds-xxxxx (CrashLoopBackOff)

In my case it is related to rbac permission errors and is resolved by running

kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml

Afterwards, all kube-system pods went into running states. The upstream issue is discussed on github https://github.com/kubernetes/kubernetes/issues/44029

-- Zhenhua
Source: StackOverflow

2/5/2017

as your gist says your pod network seems to be broken. You are using some custom podnetwork with 10.10.10.X. You should communicate this IPs to all components.

Please check, there is no collision with other existing nets.

I recommend you to setup with Calico, as this was the solution for me to bring up CoreOS k8s up working

-- David Steiman
Source: StackOverflow

2/5/2017

If you installed your cluster with kubeadm you should add a pod network after installing.

If you choose flannel as your pod network, you should have this argument in your init command kubeadm init --pod-network-cidr 10.244.0.0/16.

The flannel YAML file can be found in the coreOS flannel repo.

All you need to do if your cluster was initialized properly (read above), is to run kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Once this is up and running (it will create pods on every node), your kube-dns pod should come up.

If you need to reset your installation (for example to add the argument to kubeadm init), you can use kubeadm reset on all nodes.

Normally, you would run the init command on the master, then add a pod network, and then add your other nodes.

This is all described in more detail in the Getting started guide, step 3/4 regarding the pod network.

-- Morten Steen Rasmussen
Source: StackOverflow