We are currently using a few clusters with v1.8.7 (which was created by currently unavailable developers, months ago) and are trying to upgrade to a higher version. However, we wanted to try the same on an cluster we use for experimental & POCs.
In doing the same, we tried to run a few kubeadm commands on one of the master nodes, but kubeadm was not found.
So, we tried installing the same with commands -
apt-get update && apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
However, now that node has status Not Ready and kubelet service is failing
Any pointers on how to fix this and what we should've done ?
root@k8s-master-dev-0:/home/azureuser# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-dev-0 NotReady master 118d v1.8.7
k8s-master-dev-1 Ready master 118d v1.8.7
k8s-master-dev-2 Ready master 163d v1.8.7
k8s-agents-dev-0 Ready agent 163d v1.8.7
k8s-agents-dev-1 Ready agent 163d v1.8.7
k8s-agents-dev-2 Ready agent 163d v1.8.7
root@k8s-master-dev-0:/home/azureuser# systemctl status kubelet.service
● kubelet.service - Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: failed (Result: start-limit-hit) since Thu 2018-12-13 14:33:25 UTC; 18h ago
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Control process exited, code=exited status=2
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: Failed to start Kubelet.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Unit entered failed state.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: Stopped Kubelet.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Start request repeated too quickly.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: Failed to start Kubelet.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Unit entered failed state.
Dec 13 14:33:25 k8s-master-dev-0 systemd[1]: kubelet.service: Failed with result 'start-limit-hit'.
Is it clean Kubernetes cluster?
I think you should be careful with installation kubelet kubeadm kubectl
in a LIVE Kubernetes cluster.
Here you can find more information about installation kubelet
on a live cluster. https://kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/
Can you show me your output off:
kubectl get all --namespace kube-system
@wrogrammer
root@k8s-master-dev-0:/var/log/apt# kubectl get all --namespace kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/kube-proxy 6 6 5 6 5 beta.kubernetes.io/os=linux 164d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/heapster 1 1 1 1 164d
deploy/kube-dns-v20 2 2 2 2 164d
deploy/kubernetes-dashboard 1 1 1 1 164d
deploy/tiller-deploy 1 1 1 1 164d
NAME DESIRED CURRENT READY AGE
rs/heapster-75f8df9884 1 1 1 164d
rs/heapster-7d6ffbf65 0 0 0 164d
rs/kube-dns-v20-5d9fdc7448 2 2 2 164d
rs/kubernetes-dashboard-8555bd85db 1 1 1 164d
rs/tiller-deploy-6677dc8d46 1 1 1 163d
rs/tiller-deploy-86d6cf59b 0 0 0 164d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/heapster 1 1 1 1 164d
deploy/kube-dns-v20 2 2 2 2 164d
deploy/kubernetes-dashboard 1 1 1 1 164d
deploy/tiller-deploy 1 1 1 1 164d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/kube-proxy 6 6 5 6 5 beta.kubernetes.io/os=linux 164d
NAME DESIRED CURRENT READY AGE
rs/heapster-75f8df9884 1 1 1 164d
rs/heapster-7d6ffbf65 0 0 0 164d
rs/kube-dns-v20-5d9fdc7448 2 2 2 164d
rs/kubernetes-dashboard-8555bd85db 1 1 1 164d
rs/tiller-deploy-6677dc8d46 1 1 1 163d
rs/tiller-deploy-86d6cf59b 0 0 0 164d
NAME READY STATUS RESTARTS AGE
po/heapster-75f8df9884-nxn2z 2/2 Running 0 37d
po/kube-addon-manager-k8s-master-dev-0 1/1 Unknown 4 30d
po/kube-addon-manager-k8s-master-dev-1 1/1 Running 4 118d
po/kube-addon-manager-k8s-master-dev-2 1/1 Running 2 164d
po/kube-apiserver-k8s-master-dev-0 1/1 Unknown 4 30d
po/kube-apiserver-k8s-master-dev-1 1/1 Running 4 118d
po/kube-apiserver-k8s-master-dev-2 1/1 Running 2 164d
po/kube-controller-manager-k8s-master-dev-0 1/1 Unknown 6 30d
po/kube-controller-manager-k8s-master-dev-1 1/1 Running 4 118d
po/kube-controller-manager-k8s-master-dev-2 1/1 Running 4 164d
po/kube-dns-v20-5d9fdc7448-smf9s 3/3 Running 0 37d
po/kube-dns-v20-5d9fdc7448-vtjh4 3/3 Running 0 37d
po/kube-proxy-cklcx 1/1 Running 1 118d
po/kube-proxy-dldnd 1/1 Running 4 164d
po/kube-proxy-gg89s 1/1 NodeLost 3 163d
po/kube-proxy-mrkqf 1/1 Running 4 143d
po/kube-proxy-s95mm 1/1 Running 10 164d
po/kube-proxy-zxnb7 1/1 Running 2 164d
po/kube-scheduler-k8s-master-dev-0 1/1 Unknown 6 30d
po/kube-scheduler-k8s-master-dev-1 1/1 Running 6 118d
po/kube-scheduler-k8s-master-dev-2 1/1 Running 4 164d
po/kubernetes-dashboard-8555bd85db-4txtm 1/1 Running 0 37d
po/tiller-deploy-6677dc8d46-5n5cp 1/1 Running 0 37d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/heapster ClusterIP XX Redacted XX <none> 80/TCP 164d
svc/kube-dns ClusterIP XX Redacted XX <none> 53/UDP,53/TCP 164d
svc/kubernetes-dashboard NodePort XX Redacted XX <none> 80:31279/TCP 164d
svc/tiller-deploy ClusterIP XX Redacted XX <none> 44134/TCP 164d
The reason your kubelet went into bad state is that you upgraded kubelet package and service file for kubelet must be renewed and If you earlier did some changes must be lost.
Following things you can try:
swapoff -a
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
and check the value --cgroup-driver
and if it is systemd
make it cgroupfs
and then:Reload the daemon and restart kubelet:
systemctl daemon-reload
systemctl restart kubelet
Now check if your kubelet started or not.
PS: Live upgrade of kubeadm control plane should be done carefully, check my answer on how to upgrade kubeadm