Hoping someone can help. I have a 3x node CoreOS cluster running Kubernetes. The nodes are as follows: 192.168.1.201 - Controller 192.168.1.202 - Worker Node 192.168.1.203 - Worker Node
The cluster is up and running, and I can run the following commands:
> kubectl get nodes
NAME STATUS AGE
192.168.1.201 Ready,SchedulingDisabled 1d
192.168.1.202 Ready 21h
192.168.1.203 Ready 21h
> kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-192.168.1.201 1/1 Running 2 1d
kube-controller-manager-192.168.1.201 1/1 Running 4 1d
kube-dns-v20-h4w7m 2/3 CrashLoopBackOff 15 23m
kube-proxy-192.168.1.201 1/1 Running 2 1d
kube-proxy-192.168.1.202 1/1 Running 1 21h
kube-proxy-192.168.1.203 1/1 Running 1 21h
kube-scheduler-192.168.1.201 1/1 Running 4 1d
As you can see, the kube-dns service is not running correctly. It keeps restarting and I am struggling to understand why. Any help in debugging this would be greatly appreciated (or pointers at where to read about debugging this. Running kubectl logs does not bring anything back...not sure if the addons function differently to standard pods.
Running a kubectl describe pods, I can see the containers are killed due to being unhealthy:
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Created Created container with docker id 189afaa1eb0d; Security:[seccomp=unconfined]
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Started Started container with docker id 189afaa1eb0d
14m 14m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Killing Killing container with docker id 189afaa1eb0d: pod "kube-dns-v20-h4w7m_kube-system(3a545c95-ea19-11e6-aa7c-52540021bfab)" container "kubedns" is unhealthy, it will be killed and re-created
Please find a full output of this command as a github gist here: https://gist.github.com/mehstg/0b8016f5398a8781c3ade8cf49c02680
Thanks in advance!
After followed the steps in the official kubeadm doc with flannel networking, I run into a similar issue
http://janetkuo.github.io/docs/getting-started-guides/kubeadm/
It appears as networking pods get stuck in error states:
kube-dns-xxxxxxxx-xxxvn (rpc error)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
In my case it is related to rbac permission errors and is resolved by running
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
Afterwards, all kube-system pods went into running states. The upstream issue is discussed on github https://github.com/kubernetes/kubernetes/issues/44029
as your gist says your pod network seems to be broken. You are using some custom podnetwork with 10.10.10.X. You should communicate this IPs to all components.
Please check, there is no collision with other existing nets.
I recommend you to setup with Calico, as this was the solution for me to bring up CoreOS k8s up working
If you installed your cluster with kubeadm you should add a pod network after installing.
If you choose flannel as your pod network, you should have this argument in your init command kubeadm init --pod-network-cidr 10.244.0.0/16
.
The flannel YAML file can be found in the coreOS flannel repo.
All you need to do if your cluster was initialized properly (read above), is to run kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Once this is up and running (it will create pods on every node), your kube-dns pod should come up.
If you need to reset your installation (for example to add the argument to kubeadm init
), you can use kubeadm reset
on all nodes.
Normally, you would run the init command on the master, then add a pod network, and then add your other nodes.
This is all described in more detail in the Getting started guide, step 3/4 regarding the pod network.