ICP 2.1.0.3 Install Timeout: FAILED - RETRYING: Waiting for kube-dns to start

7/8/2018

Looks like issues is because of CNI (calico) but not sure what is the fix in ICP (see journalctl -u kubelet logs below)

ICP Installer Log:

FAILED! => {"attempts": 100, "changed": true, "cmd": "kubectl -n kube-system get daemonset kube-dns -o=custom-columns=A:.status.numberAvailable,B:.status.desiredNumberScheduled --no-headers=true | tr -s \" \" | awk '$1 == $2 {print \"READY\"}'", "delta": "0:00:00.403489", "end": "2018-07-08 09:04:21.922839", "rc": 0, "start": "2018-07-08 09:04:21.519350", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

journalctl -u kubelet:

Jul 08 22:40:38 dev-master hyperkube[2763]: E0708 22:40:38.548157    2763 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.Node: nodes is forbidden: User "kubelet" cannot list nodes at the cluster scope
Jul 08 22:40:38 dev-master hyperkube[2763]: E0708 22:40:38.549872    2763 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "kubelet" cannot list pods at the cluster scope
Jul 08 22:40:38 dev-master hyperkube[2763]: E0708 22:40:38.555379    2763 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: services is forbidden: User "kubelet" cannot list services at the cluster scope
Jul 08 22:40:38 dev-master hyperkube[2763]: E0708 22:40:38.738411    2763 event.go:200] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"k8s-master-10.50.50.201.153f85e7528e5906", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"k8s-master-10.50.50.201", UID:"b0ed63e50c3259666286e5a788d12b81", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{scheduler}"}, Reason:"Started", Message:"Started container", Source:v1.EventSource{Component:"kubelet", Host:"10.50.50.201"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbec8c296b58a5506, ext:106413065445, loc:(*time.Location)(0xb58e300)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbec8c296b58a5506, ext:106413065445, loc:(*time.Location)(0xb58e300)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "kubelet" cannot create events in the namespace "kube-system"' (will not retry!)

Jul 08 22:40:43 dev-master hyperkube[2763]: E0708 22:40:43.938806    2763 kubelet.go:2130] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
    Jul 08 22:40:44 dev-master hyperkube[2763]: E0708 22:40:44.556337    2763 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.Node: nodes is forbidden: User "kubelet" cannot list nodes at the cluster scope
    Jul 08 22:40:44 dev-master hyperkube[2763]: E0708 22:40:44.557513    2763 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "kubelet" cannot list pods at the cluster scope
    Jul 08 22:40:44 dev-master hyperkube[2763]: E0708 22:40:44.561007    2763 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: services is forbidden: User "kubelet" cannot list services at the cluster scope
    Jul 08 22:40:45 dev-master hyperkube[2763]: E0708 22:40:45.557584    2763 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.Node: nodes is forbidden: User "kubelet" cannot list nodes at the cluster scope
    Jul 08 22:40:45 dev-master hyperkube[2763]: E0708 22:40:45.558375    2763 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "kubelet" cannot list pods at the cluster scope
    Jul 08 22:40:45 dev-master hyperkube[2763]: E0708 22:40:45.561807    2763 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: services is forbidden: User "kubelet" cannot list services at the cluster scope
    Jul 08 22:40:46 dev-master hyperkube[2763]: I0708 22:40:46.393905    2763 kubelet_node_status.go:289] Setting node annotation to enable volume controller attach/detach
    Jul 08 22:40:46 dev-master hyperkube[2763]: I0708 22:40:46.396261    2763 kubelet_node_status.go:83] Attempting to register node 10.50.50.201
    Jul 08 22:40:46 dev-master hyperkube[2763]: E0708 22:40:46.397540    2763 kubelet_node_status.go:107] Unable to register node "10.50.50.201" with API server: nodes is forbidden: User "kubelet" cannot create nodes at the cluster scope

Jul 08 19:43:48 dev-master hyperkube[9689]: E0708 19:43:48.161949    9689 cni.go:259] Error adding network: no configured Calico pools
Jul 08 19:43:48 dev-master hyperkube[9689]: E0708 19:43:48.161980    9689 cni.go:227] Error while adding to cni network: no configured Calico pools
Jul 08 19:43:48 dev-master hyperkube[9689]: E0708 19:43:48.468392    9689 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kube-dns-splct_kube-system" network: no configured Calico
Jul 08 19:43:48 dev-master hyperkube[9689]: E0708 19:43:48.468455    9689 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-dns-splct_kube-system(113e64b2-82e6-11e8-83bb-0242a9e42805)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up 
Jul 08 19:43:48 dev-master hyperkube[9689]: E0708 19:43:48.468479    9689 kuberuntime_manager.go:646] createPodSandbox for pod "kube-dns-splct_kube-system(113e64b2-82e6-11e8-83bb-0242a9e42805)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up
Jul 08 19:43:48 dev-master hyperkube[9689]: E0708 19:43:48.468556    9689 pod_workers.go:186] Error syncing pod 113e64b2-82e6-11e8-83bb-0242a9e42805 ("kube-dns-splct_kube-system(113e64b2-82e6-11e8-83bb-0242a9e42805)"), skipping: failed to "CreatePodSandbox" for "kube-d
Jul 08 19:43:48 dev-master hyperkube[9689]: I0708 19:43:48.938222    9689 kuberuntime_manager.go:513] Container {Name:calico-node Image:ibmcom/calico-node:v3.0.4 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:ETCD_ENDPOINTS Value: ValueFrom:&EnvVarSource
Jul 08 19:43:48 dev-master hyperkube[9689]: e:FELIX_HEALTHENABLED Value:true ValueFrom:nil} {Name:IP_AUTODETECTION_METHOD Value:can-reach=10.50.50.201 ValueFrom:nil}] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:lib-modules ReadOnly:true MountPath:/lib/m
Jul 08 19:43:48 dev-master hyperkube[9689]: I0708 19:43:48.938449    9689 kuberuntime_manager.go:757] checking backoff for container "calico-node" in pod "calico-node-wpln7_kube-system(10107b3e-82e6-11e8-83bb-0242a9e42805)"
Jul 08 19:43:48 dev-master hyperkube[9689]: I0708 19:43:48.938699    9689 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=calico-node pod=calico-node-wpln7_kube-system(10107b3e-82e6-11e8-83bb-0242a9e42805)
Jul 08 19:43:48 dev-master hyperkube[9689]: E0708 19:43:48.938735    9689 pod_workers.go:186] Error syncing pod 10107b3e-82e6-11e8-83bb-0242a9e42805 ("calico-node-wpln7_kube-system(10107b3e-82e6-11e8-83bb-0242a9e42805)"), skipping: failed to "StartContainer" for "calic
lines 4918-4962/4962 (END)

docker ps (master node): Container-> k8s_POD_kube-dns-splct_kube-system-* is repeatedly crashing.

CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS                  PORTS               NAMES
ed24d636fdd1        ibmcom/pause:3.0                 "/pause"                 1 second ago        Up Less than a second                       k8s_POD_kube-dns-splct_kube-system_113e64b2-82e6-11e8-83bb-0242a9e42805_121
49b648837900        ibmcom/calico-cni                "/install-cni.sh"        5 minutes ago       Up 5 minutes                                k8s_install-cni_calico-node-wpln7_kube-system_10107b3e-82e6-11e8-83bb-0242a9e42805_0
933ff30177de        ibmcom/calico-kube-controllers   "/usr/bin/kube-contr…"   5 minutes ago       Up 5 minutes                                k8s_calico-kube-controllers_calico-kube-controllers-759f7fc556-mm5tg_kube-system_1010712e-82e6-11e8-83bb-0242a9e42805_0
12e9262299af        ibmcom/pause:3.0                 "/pause"                 6 minutes ago       Up 5 minutes                                k8s_POD_calico-kube-controllers-759f7fc556-mm5tg_kube-system_1010712e-82e6-11e8-83bb-0242a9e42805_0
8dcb2b2b3eb5        ibmcom/pause:3.0                 "/pause"                 6 minutes ago       Up 5 minutes                                k8s_POD_calico-node-wpln7_kube-system_10107b3e-82e6-11e8-83bb-0242a9e42805_0
9486ff78df31        ibmcom/tiller                    "/tiller"                6 minutes ago       Up 6 minutes                                k8s_tiller_tiller-deploy-c59888d97-7nwph_kube-system_016019ab-82e6-11e8-83bb-0242a9e42805_0
e5588f68af1b        ibmcom/pause:3.0                 "/pause"                 6 minutes ago       Up 6 minutes                                k8s_POD_tiller-deploy-c59888d97-7nwph_kube-system_016019ab-82e6-11e8-83bb-0242a9e42805_0
e80460d857ff        ibmcom/icp-image-manager         "/icp-image-manager …"   10 minutes ago      Up 10 minutes                               k8s_image-manager_image-manager-0_kube-system_7b7554ce-82e5-11e8-83bb-0242a9e42805_0
e207175f19b7        ibmcom/registry                  "/entrypoint.sh /etc…"   10 minutes ago      Up 10 minutes                               k8s_icp-registry_image-manager-0_kube-system_7b7554ce-82e5-11e8-83bb-0242a9e42805_0
477faf0668f3        ibmcom/pause:3.0                 "/pause"                 10 minutes ago      Up 10 minutes                               k8s_POD_image-manager-0_kube-system_7b7554ce-82e5-11e8-83bb-0242a9e42805_0
8996bb8c37b7        d4b6454d4873                     "/hyperkube schedule…"   10 minutes ago      Up 10 minutes                               k8s_scheduler_k8s-master-10.50.50.201_kube-system_9e5bce1f08c050be21fa6380e4e363cc_0
835ee941432c        d4b6454d4873                     "/hyperkube apiserve…"   10 minutes ago      Up 10 minutes                               k8s_apiserver_k8s-master-10.50.50.201_kube-system_9e5bce1f08c050be21fa6380e4e363cc_0
de409ff63cb2        d4b6454d4873                     "/hyperkube controll…"   10 minutes ago      Up 10 minutes                               k8s_controller-manager_k8s-master-10.50.50.201_kube-system_9e5bce1f08c050be21fa6380e4e363cc_0
716032a308ea        ibmcom/pause:3.0                 "/pause"                 10 minutes ago      Up 10 minutes                               k8s_POD_k8s-master-10.50.50.201_kube-system_9e5bce1f08c050be21fa6380e4e363cc_0
bd9e64e3d6a2        d4b6454d4873                     "/hyperkube proxy --…"   12 minutes ago      Up 12 minutes                               k8s_proxy_k8s-proxy-10.50.50.201_kube-system_3e068267cfe8f990cd2c9a4635be044d_0
bab3c9ef7e40        ibmcom/pause:3.0                 "/pause"                 12 minutes ago      Up 12 minutes                               k8s_POD_k8s-proxy-10.50.50.201_kube-system_3e068267cfe8f990cd2c9a4635be044d_0

Kubectl (master node): I believe kube should have been initialized and running by this time but seems like it is not.

kubectl get pods -s 127.0.0.1:8888 --all-namespaces
The connection to the server 127.0.0.1:8888 was refused - did you specify the right host or port?

Following are the options I tried:

  1. Create cluster with both IP_IP enabled and disabled. As all nodes are on same subnet, IP_IP setup should not have impact.
  2. Etcd running on a separate node and as part of master node
  3. ifconfig tunl0 returns following (i.e. w/o IP assignment) in all of the above scenarios :

    tunl0 Link encap:IPIP Tunnel HWaddr

        NOARP  MTU:1480  Metric:1
        RX packets:0 errors:0 dropped:0 overruns:0 frame:0
        TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000
        RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
  4. 'calicoctl get profile' returns empty and so does 'calicoctl get nodes' which I believe is because, calico is not configured yet.

Other checks, thoughts and options?

Calico Kube Contoller Logs (repeated):

2018-07-09 05:46:08.440 [WARNING][1] cache.go 278: Value for key has changed, queueing update to reprogram key="kns.default" type=v3.Profile
2018-07-09 05:46:08.440 [WARNING][1] cache.go 278: Value for key has changed, queueing update to reprogram key="kns.kube-public" type=v3.Profile
2018-07-09 05:46:08.440 [WARNING][1] cache.go 278: Value for key has changed, queueing update to reprogram key="kns.kube-system" type=v3.Profile
2018-07-09 05:46:08.440 [INFO][1] namespace_controller.go 223: Create/Update Profile in Calico datastore key="kns.default"
2018-07-09 05:46:08.441 [INFO][1] namespace_controller.go 246: Update Profile in Calico datastore with resource version  key="kns.default"
2018-07-09 05:46:08.442 [INFO][1] namespace_controller.go 252: Successfully updated profile key="kns.default"
2018-07-09 05:46:08.442 [INFO][1] namespace_controller.go 223: Create/Update Profile in Calico datastore key="kns.kube-public"
2018-07-09 05:46:08.446 [INFO][1] namespace_controller.go 246: Update Profile in Calico datastore with resource version  key="kns.kube-public"
2018-07-09 05:46:08.447 [INFO][1] namespace_controller.go 252: Successfully updated profile key="kns.kube-public"
2018-07-09 05:46:08.447 [INFO][1] namespace_controller.go 223: Create/Update Profile in Calico datastore key="kns.kube-system"
2018-07-09 05:46:08.465 [INFO][1] namespace_controller.go 246: Update Profile in Calico datastore with resource version  key="kns.kube-system"
2018-07-09 05:46:08.476 [INFO][1] namespace_controller.go 252: Successfully updated profile key="kns.kube-system"
-- Hari
ibm-cloud-private
kubernetes

3 Answers

8/13/2018

Well issue seemed to be with the Docker storage driver (btrfs) that I was using. Once I switched to 'Overlay', I was able to move forward.

-- Hari
Source: StackOverflow

8/15/2018

I had the same experience, at least the same high level error message:

"FAILED - RETRYING: Waiting for kube-dns to start".

I had to do 2 things to pass this installation step:

  • change the hostname (and the entry in my /etc/hosts) to lowercase. Calico doesn't like uppercase
  • comment the localhost entry in /etc/hosts (#127.0.0.1 localhost.localdomain localhost)

After doing it, installation completed fine.

-- gxvigo
Source: StackOverflow

7/9/2018

Firstly, from ICP 2.1.0.3, the insecure port 8888 for K8S apiserver is disabled, so you can not use this insecure port to talk to Kubenetes.

For this issue, could you let me know the below information or outputs.

  • The network configurations in your environment. -> ifconfig -a
  • The route table: -> route
  • The contents of your /ect/hosts file: -> cat /etc/hosts
  • The ICP installation configurations files. -> config.yaml & hosts
-- AsirXing
Source: StackOverflow