kube-controller-manager & kube-scheduler in CrashLoopBackoff Error in kubernetes

5/13/2019

I used calico as CNI in my k8s, I'm trying to deploy a single master cluster in 3 servers. I'm using kubeadm, follow the official setup guide. But some error occurred, kube-controller-manager and kube-scheduler go in CrashLoopBackOff error and cannot run well.

I have tried kubeadm reset at every server, and also restart the server, downgrade docker.

I use kubeadm init --apiserver-advertise-address=192.168.213.128 --pod-network-cidr=192.168.0.0/16 to init the master, and run kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml and kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml to start calico.

[root@k8s-master ~]# docker info
Containers: 20
 Running: 18
 Paused: 0
 Stopped: 2
Images: 10
Server Version: 18.09.6
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-957.12.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 972.6MiB
Name: k8s-master
ID: RN6I:PP52:4WTU:UP7E:T3LF:MXVZ:EDBX:RSII:BIRW:36O2:CYJ3:FRV2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Registry Mirrors:
 https://i70c3eqq.mirror.aliyuncs.com/
 https://docker.mirrors.ustc.edu.cn/
Live Restore Enabled: false
Product License: Community Engine
[root@k8s-master ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
[root@k8s-master ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:08:49Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
[root@k8s-master ~]# kubelet --version
Kubernetes v1.14.1
[root@k8s-master ~]# kubectl get no -A
NAME         STATUS   ROLES    AGE   VERSION
k8s-master   Ready    master   49m   v1.14.1
[root@k8s-master ~]# kubectl get pods -A
NAMESPACE     NAME                                 READY   STATUS             RESTARTS   AGE
kube-system   calico-node-xmc5t                    2/2     Running            0          27m
kube-system   coredns-6765558d84-945mt             1/1     Running            0          28m
kube-system   coredns-6765558d84-xz7lw             1/1     Running            0          28m
kube-system   coredns-fb8b8dccf-z87sl              1/1     Running            0          31m
kube-system   etcd-k8s-master                      1/1     Running            0          30m
kube-system   kube-apiserver-k8s-master            1/1     Running            0          29m
kube-system   kube-controller-manager-k8s-master   0/1     CrashLoopBackOff   8          30m
kube-system   kube-proxy-wp7n9                     1/1     Running            0          31m
kube-system   kube-scheduler-k8s-master            1/1     Running            7          29m
[root@k8s-master ~]# kubectl logs -n kube-system kube-controller-manager-k8s-master
I0513 13:49:51.836448       1 serving.go:319] Generated self-signed cert in-memory
I0513 13:49:52.988794       1 controllermanager.go:155] Version: v1.14.1
I0513 13:49:53.003873       1 secure_serving.go:116] Serving securely on 127.0.0.1:10257
I0513 13:49:53.005146       1 deprecated_insecure_serving.go:51] Serving insecurely on [::]:10252
I0513 13:49:53.008661       1 leaderelection.go:217] attempting to acquire leader lease  kube-system/kube-controller-manager...
I0513 13:50:12.687383       1 leaderelection.go:227] successfully acquired lease kube-system/kube-controller-manager
I0513 13:50:12.700344       1 event.go:209] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"39adc911-7582-11e9-a70e-000c2908c796", APIVersion:"v1", ResourceVersion:"1706", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' k8s-master_fbfa0502-7585-11e9-9939-000c2908c796 became leader
I0513 13:50:13.131264       1 plugins.go:103] No cloud provider specified.
I0513 13:50:13.166088       1 controller_utils.go:1027] Waiting for caches to sync for tokens controller
I0513 13:50:13.368381       1 controllermanager.go:497] Started "podgc"
I0513 13:50:13.368666       1 gc_controller.go:76] Starting GC controller
I0513 13:50:13.368697       1 controller_utils.go:1027] Waiting for caches to sync for GC controller
I0513 13:50:13.368717       1 controller_utils.go:1034] Caches are synced for tokens controller
I0513 13:50:13.453276       1 controllermanager.go:497] Started "attachdetach"
I0513 13:50:13.453534       1 attach_detach_controller.go:323] Starting attach detach controller
I0513 13:50:13.453545       1 controller_utils.go:1027] Waiting for caches to sync for attach detach controller
I0513 13:50:13.461756       1 controllermanager.go:497] Started "clusterrole-aggregation"
I0513 13:50:13.461833       1 clusterroleaggregation_controller.go:148] Starting ClusterRoleAggregator
I0513 13:50:13.461849       1 controller_utils.go:1027] Waiting for caches to sync for ClusterRoleAggregator controller
I0513 13:50:13.517257       1 controllermanager.go:497] Started "endpoint"
I0513 13:50:13.525394       1 endpoints_controller.go:166] Starting endpoint controller
I0513 13:50:13.525425       1 controller_utils.go:1027] Waiting for caches to sync for endpoint controller
I0513 13:50:14.151371       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for rolebindings.rbac.authorization.k8s.io
I0513 13:50:14.151463       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for leases.coordination.k8s.io
I0513 13:50:14.151489       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for limitranges
I0513 13:50:14.163632       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for ingresses.extensions
I0513 13:50:14.163695       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for daemonsets.apps
I0513 13:50:14.163721       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for ingresses.networking.k8s.io
I0513 13:50:14.163742       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for poddisruptionbudgets.policy
W0513 13:50:14.163757       1 shared_informer.go:311] resyncPeriod 67689210101997 is smaller than resyncCheckPeriod 86008177281797 and the informer has already started. Changing it to 86008177281797
I0513 13:50:14.163840       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for networkpolicies.networking.k8s.io
W0513 13:50:14.163848       1 shared_informer.go:311] resyncPeriod 64017623179979 is smaller than resyncCheckPeriod 86008177281797 and the informer has already started. Changing it to 86008177281797
I0513 13:50:14.163867       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for serviceaccounts
I0513 13:50:14.163885       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for deployments.extensions
I0513 13:50:14.163911       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for daemonsets.extensions
I0513 13:50:14.163925       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for controllerrevisions.apps
I0513 13:50:14.163942       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for roles.rbac.authorization.k8s.io
I0513 13:50:14.163965       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for podtemplates
I0513 13:50:14.163994       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for cronjobs.batch
I0513 13:50:14.164004       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for endpoints
I0513 13:50:14.164019       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for replicasets.extensions
I0513 13:50:14.164030       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for replicasets.apps
I0513 13:50:14.164039       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for deployments.apps
I0513 13:50:14.164054       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for jobs.batch
I0513 13:50:14.164079       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for statefulsets.apps
I0513 13:50:14.164097       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for events.events.k8s.io
I0513 13:50:14.164115       1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for horizontalpodautoscalers.autoscaling
E0513 13:50:14.164139       1 resource_quota_controller.go:171] initial monitor sync has error: [couldn't start monitor for resource "extensions/v1beta1, Resource=networkpolicies": unable to monitor quota for resource "extensions/v1beta1, Resource=networkpolicies", couldn't start monitor for resource "crd.projectcalico.org/v1, Resource=networkpolicies": unable to monitor quota for resource "crd.projectcalico.org/v1, Resource=networkpolicies"]
I0513 13:50:14.164154       1 controllermanager.go:497] Started "resourcequota"
I0513 13:50:14.171002       1 resource_quota_controller.go:276] Starting resource quota controller
I0513 13:50:14.171096       1 controller_utils.go:1027] Waiting for caches to sync for resource quota controller
I0513 13:50:14.171138       1 resource_quota_monitor.go:301] QuotaMonitor running
I0513 13:50:15.776814       1 controllermanager.go:497] Started "job"
I0513 13:50:15.771658       1 job_controller.go:143] Starting job controller
I0513 13:50:15.807719       1 controller_utils.go:1027] Waiting for caches to sync for job controller
I0513 13:50:23.065972       1 controllermanager.go:497] Started "csrcleaner"
I0513 13:50:23.047495       1 cleaner.go:81] Starting CSR cleaner controller
I0513 13:50:25.019036       1 event.go:209] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"39adc911-7582-11e9-a70e-000c2908c796", APIVersion:"v1", ResourceVersion:"1706", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' k8s-master_fbfa0502-7585-11e9-9939-000c2908c796 stopped leading
I0513 13:50:25.125784       1 leaderelection.go:263] failed to renew lease kube-system/kube-controller-manager: failed to tryAcquireOrRenew context deadline exceeded
F0513 13:50:25.189307       1 controllermanager.go:260] leaderelection lost
[root@k8s-master ~]# kubectl logs -n kube-system kube-scheduler-k8s-master
I0513 14:16:04.350818       1 serving.go:319] Generated self-signed cert in-memory
W0513 14:16:06.203477       1 authentication.go:387] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0513 14:16:06.215933       1 authentication.go:249] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0513 14:16:06.215947       1 authentication.go:252] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0513 14:16:06.218951       1 authorization.go:177] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0513 14:16:06.218983       1 authorization.go:146] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0513 14:16:06.961417       1 server.go:142] Version: v1.14.1
I0513 14:16:06.974064       1 defaults.go:87] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
W0513 14:16:06.997875       1 authorization.go:47] Authorization is disabled
W0513 14:16:06.997889       1 authentication.go:55] Authentication is disabled
I0513 14:16:06.997908       1 deprecated_insecure_serving.go:49] Serving healthz insecurely on [::]:10251
I0513 14:16:06.998196       1 secure_serving.go:116] Serving securely on 127.0.0.1:10259
I0513 14:16:08.872649       1 controller_utils.go:1027] Waiting for caches to sync for scheduler controller
I0513 14:16:08.973148       1 controller_utils.go:1034] Caches are synced for scheduler controller
I0513 14:16:09.003227       1 leaderelection.go:217] attempting to acquire leader lease  kube-system/kube-scheduler...
I0513 14:16:25.814160       1 leaderelection.go:227] successfully acquired lease kube-system/kube-scheduler

What is the reason for kube-controller-manager and kube-scheduler going into CrashLoopBackoff? And how can I make kube-controller-manager and kube-scheduler run well?

-- mio leon
kube-controller-manager
kube-scheduler
kubeadm
kubectl
kubernetes

1 Answer

6/12/2019

I have reproduced the steps you listed on a cloud VM and managed to make it work fine.

Got few ideas that might help:

  1. Be sure to meet all the prerequisites listed here

  2. Install the most recent version of Docker following the guide from here (chose the proper OS that you use)

  3. Install kubeadm useing the commands below:

     apt-get update && apt-get install -y apt-transport-https curl
     curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
     cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
     deb https://apt.kubernetes.io/ kubernetes-xenial main
     EOF
     apt-get update
     apt-get install -y kubelet kubeadm kubectl
     apt-mark hold kubelet kubeadm kubectl
  4. Make sure you got the latest version of kubeadm by executing: apt-get update && apt-get upgrade

  5. Make sure you use the proper arguments alongside kubeadm init

  6. Don't forget to run:

    • mkdir -p $HOME/.kube

    • sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

    • sudo chown $(id -u):$(id -g) $HOME/.kube/config

after kubeadm init finishes (these commands are also part of the kubeadm init output).

  1. Finally apply the.yaml files you listed in your question.

Notice that by following above steps you will have kubectl version, kubelet --version and kubectl get no -A all in v1.14.3 and not v1.14.1 like you showed, which might be the case.

I hope it helps.

-- OhHiMark
Source: StackOverflow