Kubernetes master stuck on NotReady after failed to upgrade

3/25/2020

I have K8S cluster with version 1.13.2, and I want to upgrade to version 1.17.x (latest 1.17).

I looked at the official notes:https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ which states that I need to upgrade one minor at a time, meaning 1.14, then 1.15, 1.16 and only then to 1.17.

I made all perparations (disabled swap), run everything by the docs, determined that the latest 1.14 is 1.14.10.

When I ran:

apt-mark unhold kubeadm kubelet && \
 apt-get update && apt-get install -y kubeadm=1.14.10-00 && \
apt-mark hold kubeadm

For some reason it seems that kubectl v1.18 was downloaded as well.

I continued and tried running sudo kubeadm upgrade plan, but it failed with the following error:

[perflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/health] FATAL: [preflight] Some fatal errors occurred:
    [ERROR ControlPlaneNodesReady]: there are Notready control-planes in the cluster: [<name of master>]
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

When running kubectl get nodes, it says under VERSION that master is indeed NotReady and with version 1.18.0, while workers are of course v1.13.2 and Ready (unchanged).

How can I fix my cluster?

And what did I do wrong when I tried upgrading?

-- ChikChak
kubernetes

1 Answer

3/26/2020

I reproduced your problem in my lab and what happened is that you accidentally upgraded more than you wanted. More specifically, you upgraded kubelet package in your master node (Control Plane).

So here is my healthy cluster with version 1.13.2:

$ kubectl get nodes
NAME            STATUS   ROLES    AGE     VERSION
kubeadm-lab-0   Ready    master   9m25s   v1.13.2
kubeadm-lab-1   Ready    <none>   6m17s   v1.13.2
kubeadm-lab-2   Ready    <none>   6m9s    v1.13.2

Now I will unhold kubeadm and kubelet as you did:

$ sudo apt-mark unhold kubeadm kubelet
Canceled hold on kubeadm.
Canceled hold on kubelet.

And finally I will upgrade kubeadm to 1.14.1:

$ sudo apt-get install kubeadm=1.14.10-00
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  conntrack kubelet kubernetes-cni
The following NEW packages will be installed:
  conntrack
The following packages will be upgraded:
  kubeadm kubelet kubernetes-cni
3 upgraded, 1 newly installed, 0 to remove and 8 not upgraded.
Need to get 34.1 MB of archives.
After this operation, 7,766 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:2 http://deb.debian.org/debian stretch/main amd64 conntrack amd64 1:1.4.4+snapshot20161117-5 [32.9 kB]
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubelet amd64 1.18.0-00 [19.4 MB]
Get:3 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.14.10-00 [8,155 kB]
Get:4 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubernetes-cni amd64 0.7.5-00 [6,473 kB]
Fetched 34.1 MB in 2s (13.6 MB/s)         
Selecting previously unselected package conntrack.
(Reading database ... 97656 files and directories currently installed.)
Preparing to unpack .../conntrack_1%3a1.4.4+snapshot20161117-5_amd64.deb ...
Unpacking conntrack (1:1.4.4+snapshot20161117-5) ...
Preparing to unpack .../kubelet_1.18.0-00_amd64.deb ...
Unpacking kubelet (1.18.0-00) over (1.13.2-00) ...
Preparing to unpack .../kubeadm_1.14.10-00_amd64.deb ...
Unpacking kubeadm (1.14.10-00) over (1.13.2-00) ...
Preparing to unpack .../kubernetes-cni_0.7.5-00_amd64.deb ...
Unpacking kubernetes-cni (0.7.5-00) over (0.6.0-00) ...
Setting up conntrack (1:1.4.4+snapshot20161117-5) ...
Setting up kubernetes-cni (0.7.5-00) ...
Setting up kubelet (1.18.0-00) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up kubeadm (1.14.10-00) ...

As you can see in this output, kubelet got updated to latest version as it's a dependency for kubeadm. Now my Master Node is NotReady as yours:

$ kubectl get nodes
NAME            STATUS     ROLES    AGE     VERSION
kubeadm-lab-0   NotReady   master   7m      v1.18.0
kubeadm-lab-1   Ready      <none>   3m52s   v1.13.2
kubeadm-lab-2   Ready      <none>   3m44s   v1.13.2

How to fix it? To fix this situation you have to downgrade a few packages that got upgraded mistakenly:

$ sudo apt-get install -y \
--allow-downgrades \
--allow-change-held-packages \
kubelet=1.13.2-00 \
kubeadm=1.13.2-00 \
kubectl=1.13.2-00 \
kubernetes-cni=0.6.0-00

After running this command, wait a few moments and check your nodes:

$ kubectl get nodes
NAME            STATUS   ROLES    AGE     VERSION
kubeadm-lab-0   Ready    master   9m25s   v1.13.2
kubeadm-lab-1   Ready    <none>   6m17s   v1.13.2
kubeadm-lab-2   Ready    <none>   6m9s    v1.13.2

How to successfully upgrade it?

You have to carefully check the impact of apt-get install before running it and make sure that your packages will be upgraded to the desired version.

In my cluster I upgraded with the following command in my master node:

$ sudo apt-mark unhold kubeadm kubelet && \
sudo apt-get update && \
sudo apt-get install -y kubeadm=1.14.10-00 kubelet=1.14.10-00 && \
sudo apt-mark hold kubeadm kubelet

My Master Node got upgraded to desired version:

$ kubectl get nodes
NAME            STATUS   ROLES    AGE   VERSION
kubeadm-lab-0   Ready    master   58m   v1.14.10
kubeadm-lab-1   Ready    <none>   55m   v1.13.2
kubeadm-lab-2   Ready    <none>   55m   v1.13.2

Now if you run sudo kubeadm upgrade plan we have the following output:

$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.13.12
[upgrade/versions] kubeadm version: v1.14.10
I0326 10:08:44.926849   21406 version.go:240] remote version is much newer: v1.18.0; falling back to: stable-1.14
[upgrade/versions] Latest stable version: v1.14.10
[upgrade/versions] Latest version in the v1.13 series: v1.13.12

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT        AVAILABLE
Kubelet     2 x v1.13.2    v1.14.10
            1 x v1.14.10   v1.14.10

Upgrade to the latest stable version:

COMPONENT            CURRENT    AVAILABLE
API Server           v1.13.12   v1.14.10
Controller Manager   v1.13.12   v1.14.10
Scheduler            v1.13.12   v1.14.10
Kube Proxy           v1.13.12   v1.14.10
CoreDNS              1.2.6      1.3.1
Etcd                 3.2.24     3.3.10

You can now apply the upgrade by executing the following command:

    kubeadm upgrade apply v1.14.10

_____________________________________________________________________

As you can see in the message, we are required to upgrade kubelet on all nodes so I run the following command on my other 2 nodes:

$ sudo apt-mark unhold kubeadm kubelet kubernetes-cni && \
sudo apt-get update && \
sudo apt-get install -y kubeadm=1.14.10-00 kubelet=1.14.10-00 && \
sudo apt-mark hold kubeadm kubelet kubernetes-cni

And finally I proceed with:

$ sudo kubeadm upgrade apply v1.14.10
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.14.10". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
-- mWatney
Source: StackOverflow