kubelet.service: Service hold-off time over, scheduling restart

12/14/2018

Context

We are currently using a few clusters with v1.8.7 (which was created by currently unavailable developers, months ago) and are trying to upgrade to a higher version. However, we wanted to try the same on an cluster we use for experimental & POCs.

What we tried

In doing the same, we tried to run a few kubeadm commands on one of the master nodes, but kubeadm was not found.

So, we tried installing the same with commands -

apt-get update && apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

What happened

However, now that node has status Not Ready and kubelet service is failing

Any pointers on how to fix this and what we should've done ?

root@k8s-master-dev-0:/home/azureuser# kubectl get nodes
NAME                     STATUS     ROLES     AGE       VERSION
k8s-master-dev-0         NotReady   master    118d      v1.8.7
k8s-master-dev-1         Ready      master    118d      v1.8.7
k8s-master-dev-2         Ready      master    163d      v1.8.7
k8s-agents-dev-0         Ready      agent     163d      v1.8.7
k8s-agents-dev-1         Ready      agent     163d      v1.8.7
k8s-agents-dev-2         Ready      agent     163d      v1.8.7

root@k8s-master-dev-0:/home/azureuser# systemctl status kubelet.service
kubelet.service - Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: failed (Result: start-limit-hit) since Thu 2018-12-13 14:33:25 UTC; 18h ago

Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Control process exited, code=exited status=2
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: Failed to start Kubelet.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Unit entered failed state.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: Stopped Kubelet.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Start request repeated too quickly.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: Failed to start Kubelet.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Unit entered failed state.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Failed with result 'start-limit-hit'.
-- Kshitij Karandikar
kubeadm
kubectl
kubelet
kubernetes
upgrade

3 Answers

12/14/2018

Is it clean Kubernetes cluster?

I think you should be careful with installation kubelet kubeadm kubectl in a LIVE Kubernetes cluster.

Here you can find more information about installation kubelet on a live cluster. https://kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/

Can you show me your output off:

kubectl get all --namespace kube-system
--
Source: StackOverflow

12/14/2018

@wrogrammer

root@k8s-master-dev-0:/var/log/apt# kubectl get all --namespace kube-system
NAME            DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
ds/kube-proxy   6         6         5         6            5           beta.kubernetes.io/os=linux   164d

NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/heapster               1         1         1            1           164d
deploy/kube-dns-v20           2         2         2            2           164d
deploy/kubernetes-dashboard   1         1         1            1           164d
deploy/tiller-deploy          1         1         1            1           164d

NAME                                 DESIRED   CURRENT   READY     AGE
rs/heapster-75f8df9884               1         1         1         164d
rs/heapster-7d6ffbf65                0         0         0         164d
rs/kube-dns-v20-5d9fdc7448           2         2         2         164d
rs/kubernetes-dashboard-8555bd85db   1         1         1         164d
rs/tiller-deploy-6677dc8d46          1         1         1         163d
rs/tiller-deploy-86d6cf59b           0         0         0         164d

NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/heapster               1         1         1            1           164d
deploy/kube-dns-v20           2         2         2            2           164d
deploy/kubernetes-dashboard   1         1         1            1           164d
deploy/tiller-deploy          1         1         1            1           164d

NAME            DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
ds/kube-proxy   6         6         5         6            5           beta.kubernetes.io/os=linux   164d

NAME                                 DESIRED   CURRENT   READY     AGE
rs/heapster-75f8df9884               1         1         1         164d
rs/heapster-7d6ffbf65                0         0         0         164d
rs/kube-dns-v20-5d9fdc7448           2         2         2         164d
rs/kubernetes-dashboard-8555bd85db   1         1         1         164d
rs/tiller-deploy-6677dc8d46          1         1         1         163d
rs/tiller-deploy-86d6cf59b           0         0         0         164d

NAME                                          READY     STATUS     RESTARTS   AGE
po/heapster-75f8df9884-nxn2z                  2/2       Running    0          37d
po/kube-addon-manager-k8s-master-dev-0        1/1       Unknown    4          30d
po/kube-addon-manager-k8s-master-dev-1        1/1       Running    4          118d
po/kube-addon-manager-k8s-master-dev-2        1/1       Running    2          164d
po/kube-apiserver-k8s-master-dev-0            1/1       Unknown    4          30d
po/kube-apiserver-k8s-master-dev-1            1/1       Running    4          118d
po/kube-apiserver-k8s-master-dev-2            1/1       Running    2          164d
po/kube-controller-manager-k8s-master-dev-0   1/1       Unknown    6          30d
po/kube-controller-manager-k8s-master-dev-1   1/1       Running    4          118d
po/kube-controller-manager-k8s-master-dev-2   1/1       Running    4          164d
po/kube-dns-v20-5d9fdc7448-smf9s              3/3       Running    0          37d
po/kube-dns-v20-5d9fdc7448-vtjh4              3/3       Running    0          37d
po/kube-proxy-cklcx                           1/1       Running    1          118d
po/kube-proxy-dldnd                           1/1       Running    4          164d
po/kube-proxy-gg89s                           1/1       NodeLost   3          163d

po/kube-proxy-mrkqf                           1/1       Running    4          143d
po/kube-proxy-s95mm                           1/1       Running    10         164d
po/kube-proxy-zxnb7                           1/1       Running    2          164d
po/kube-scheduler-k8s-master-dev-0            1/1       Unknown    6          30d
po/kube-scheduler-k8s-master-dev-1            1/1       Running    6          118d
po/kube-scheduler-k8s-master-dev-2            1/1       Running    4          164d
po/kubernetes-dashboard-8555bd85db-4txtm      1/1       Running    0          37d
po/tiller-deploy-6677dc8d46-5n5cp             1/1       Running    0          37d

NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
svc/heapster               ClusterIP   XX Redacted XX   <none>        80/TCP          164d
svc/kube-dns               ClusterIP   XX Redacted XX   <none>        53/UDP,53/TCP   164d
svc/kubernetes-dashboard   NodePort    XX Redacted XX   <none>        80:31279/TCP    164d
svc/tiller-deploy          ClusterIP   XX Redacted XX   <none>        44134/TCP       164d
-- Kshitij Karandikar
Source: StackOverflow

12/14/2018

The reason your kubelet went into bad state is that you upgraded kubelet package and service file for kubelet must be renewed and If you earlier did some changes must be lost.

Following things you can try:

  1. Disabling you swap memory: swapoff -a
  2. Check your kubelet service file, for kubeadm it is located at /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and check the value --cgroup-driver and if it is systemd make it cgroupfs and then:

Reload the daemon and restart kubelet:

systemctl daemon-reload
systemctl restart kubelet

Now check if your kubelet started or not.

PS: Live upgrade of kubeadm control plane should be done carefully, check my answer on how to upgrade kubeadm

how to upgrade kubernetes from v1.10.0 to v1.10.11

-- Prafull Ladha
Source: StackOverflow