I have a K8s cluster that was working properly but because of power failure, all the nodes got rebooted.
At the moment I have some problem recovering the master (and other nodes):
sudo systemctl kubelet status
is returning Unknown operation kubelet.
but when I run kubeadm init ...
(the command that I set up the cluster with) it returns:error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10251]: Port 10251 is in use
[ERROR Port-10252]: Port 10252 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
and when I checked those ports I can see that kubelet and other K8s components are using them:
~/k8s-multi-node$ sudo lsof -i :10251
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kube-sche 26292 root 3u IPv6 104933 0t0 TCP *:10251 (LISTEN)
~/k8s-multi-node$ sudo lsof -i :10252
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kube-cont 26256 root 3u IPv6 115541 0t0 TCP *:10252 (LISTEN)
~/k8s-multi-node$ sudo lsof -i :10250
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kubelet 24781 root 27u IPv6 106821 0t0 TCP *:10250 (LISTEN)
I tried to kill them but they start to use those ports again.
So what is the proper way to recover such a cluster? Do I need to remove kubelet and all the otehr components and install them again?
You need to first stop kubelet using sudo systemctl stop kubelet.service
After that run kubeadm reset
and then kubeadm init
. Note that this will clean up existing cluster and create a new cluster altogether.
Regarding proper way to recover check this question