How to remove kube taints from worker nodes: Taints node.kubernetes.io/unreachable:NoSchedule

6/15/2019

I was able to remove the Taint from master but my two worker nodes installed bare metal with Kubeadmin keep the unreachable taint even after issuing command to remove them. It says removed but its not permanent. And when I check taints still there. I also tried patching and setting to null but this did not work. Only thing I found on SO or anywhere else deals with master or assumes these commands work.

UPDATE: I checked the timestamp of the Taint and its added in again the moment it is deleted. So in what sense is the node unreachable? I can ping it. Is there any kubernetes diagnostics I can run to find out how it is unreachable? I checked I can ping both ways between master and worker nodes. So where would log would show error which component cannot connect?

kubectl describe no k8s-node1 | grep -i taint 
Taints:             node.kubernetes.io/unreachable:NoSchedule

Tried:

kubectl patch node k8s-node1 -p '{"spec":{"Taints":[]}}'

And

kubectl taint nodes --all node.kubernetes.io/unreachable:NoSchedule- 
kubectl  taint nodes --all           node.kubernetes.io/unreachable:NoSchedule- 
node/k8s-node1 untainted
node/k8s-node2 untainted
error: taint "node.kubernetes.io/unreachable:NoSchedule" not found

result is it says untainted for the two workers nodes but then I see them again when I grep

    kubectl describe no k8s-node1 | grep -i taint 
    Taints:             node.kubernetes.io/unreachable:NoSchedule


$ k get nodes
NAME         STATUS     ROLES    AGE   VERSION
k8s-master   Ready      master   10d   v1.14.2
k8s-node1    NotReady   <none>   10d   v1.14.2
k8s-node2    NotReady   <none>   10d   v1.14.2

UPDATE: Found someone had same problem and could only fix by resetting the cluster with Kubeadmin

  https://forum.linuxfoundation.org/discussion/846483/lab2-1-kubectl-untainted-not-working

Sure hope I dont have to do that every time the worker nodes get tainted.

k describe node k8s-node2

Name:               k8s-node2

Roles:              <none>

Labels:             beta.kubernetes.io/arch=amd64

                beta.kubernetes.io/os=linux

                kubernetes.io/arch=amd64

                kubernetes.io/hostname=k8s-node2

                kubernetes.io/os=linux

 Annotations:        flannel.alpha.coreos.com/backend-data:      {"VtepMAC":”d2:xx:61:c3:xx:16"}

                flannel.alpha.coreos.com/backend-type: vxlan

                flannel.alpha.coreos.com/kube-subnet-manager: true

                flannel.alpha.coreos.com/public-ip: 10.xx.1.xx

                kubeadm.alpha.kubernetes.io/cri-socket:   /var/run/dockershim.sock

                node.alpha.kubernetes.io/ttl: 0

                volumes.kubernetes.io/controller-managed-attach-detach: true

CreationTimestamp: Wed, 05 Jun 2019 11:46:12 +0700

 Taints:             node.kubernetes.io/unreachable:NoSchedule

Unschedulable:      false

Conditions:

Type             Status    LastHeartbeatTime                     LastTransitionTime                Reason              Message

---- ------ ----------------- ------------------ ------ -------

 MemoryPressure   Unknown   Fri, 14 Jun 2019 10:34:07 +0700   Fri, 14     Jun 2019 10:35:09 +0700   NodeStatusUnknown   Kubelet stopped posting node status.

 DiskPressure     Unknown   Fri, 14 Jun 2019 10:34:07 +0700   Fri, 14 Jun 2019 10:35:09 +0700   NodeStatusUnknown   Kubelet stopped posting node status.

 PIDPressure      Unknown   Fri, 14 Jun 2019 10:34:07 +0700   Fri, 14 Jun 2019 10:35:09 +0700   NodeStatusUnknown   Kubelet stopped posting node status.

Ready            Unknown   Fri, 14 Jun 2019 10:34:07 +0700   Fri, 14    Jun 2019 10:35:09 +0700   NodeStatusUnknown   Kubelet stopped posting node status.

Addresses:

 InternalIP:  10.10.10.xx

 Hostname:    k8s-node2

Capacity:

cpu:                2

ephemeral-storage:  26704124Ki

memory:             4096032Ki

pods:               110

Allocatable:

cpu:                2

ephemeral-storage:  24610520638

memory:             3993632Ki

pods:               110

System Info:

Machine ID:                 6e4e4e32972b3b2f27f021dadc61d21

System UUID:                6e4e4ds972b3b2f27f0cdascf61d21

Boot ID:                    abfa0780-3b0d-sda9-a664-df900627be14

Kernel Version:             4.4.0-87-generic

OS Image:                   Ubuntu 16.04.3 LTS

Operating System:           linux

Architecture:               amd64

Container Runtime Version:  docker://17.3.3

Kubelet Version:            v1.14.2

 Kube-Proxy Version:         v1.14.2

 PodCIDR:                     10.xxx.10.1/24

 Non-terminated Pods:         (18 in total)

 Namespace                  Name                                                          CPU Requests  CPU Limits    Memory Requests  Memory Limits  AGE

---------                  ----                                                        ------------  ----------    ---------------  -------------  ---

 heptio-sonobuoy            sonobuoy-systemd-logs-daemon-set-  6a8d92061c324451-hnnp9    0 (0%)        0 (0%)        0 (0%)           0   (0%)         2d1h

 istio-system               istio-pilot-7955cdff46-w648c                               110m (5%)     2100m (105%)  228Mi (5%)       1224Mi (31%)   6h55m

 istio-system               istio-telemetry-5c9cb76c56-twzf5                           150m (7%)     2100m (105%)  228Mi (5%)       1124Mi (28%)   6h55m

 istio-system               zipkin-8594bbfc6b-9p2qc                                    0 (0%)        0 (0%)        1000Mi (25%)     1000Mi (25%)   6h55m

 knative-eventing           webhook-576479cc56-wvpt6                                   0 (0%)        0 (0%)        1000Mi (25%)     1000Mi (25%)   6h45m

 knative-monitoring         elasticsearch-logging-0                                    100m (5%)     1 (50%)       0 (0%)           0 (0%)         3d20h

 knative-monitoring         grafana-5cdc94dbd-mc4jn                                    100m (5%)     200m (10%)    100Mi (2%)       200Mi (5%)     3d21h

 knative-monitoring         kibana-logging-7cb6b64bff-dh8nx                            100m (5%)     1 (50%)       0 (0%)           0 (0%)         3d20h

knative-monitoring         kube-state-metrics-56f68467c9-vr5cx                        223m (11%)    243m (12%)    176Mi (4%)       216Mi (5%)     3d21h

 knative-monitoring         node-exporter-7jw59                                        110m (5%)     220m (11%)    50Mi (1%)        90Mi (2%)      3d22h

 knative-monitoring         prometheus-system-0                                        0 (0%)        0 (0%)        400Mi (10%)      1000Mi (25%)   3d20h

 knative-serving            activator-6cfb97bccf-bfc4w                                 120m (6%)     2200m (110%)  188Mi (4%)       1624Mi (41%)   6h45m

 knative-serving            autoscaler-85749b6c48-4wf6z                                130m (6%)     2300m (114%)  168Mi (4%)       1424Mi (36%)   6h45m

 knative-serving            controller-b49d69f4d-7j27s                                 100m (5%)     1 (50%)       100Mi (2%)       1000Mi (25%)   6h45m

 knative-serving            networking-certmanager-5b5d8f5dd8-qjh5q                    100m (5%)     1 (50%)       100Mi (2%)       1000Mi (25%)   6h45m

 knative-serving            networking-istio-7977b9bbdd-vrpl5                          100m (5%)     1 (50%)       100Mi (2%)       1000Mi (25%)   6h45m

 kube-system                canal-qbn67                                                250m (12%)    0 (0%)        0 (0%)           0 (0%)         10d

 kube-system                kube-proxy-phbf5                                           0 (0%)        0 (0%)        0 (0%)           0 (0%)         10d

 Allocated resources:

   (Total limits may be over 100 percent, i.e., overcommitted.)

 Resource           Requests      Limits

--------           --------      ------

cpu                1693m (84%)   14363m (718%)

memory             3838Mi (98%)  11902Mi (305%)

ephemeral-storage  0 (0%)        0 (0%)

Events:              <none>
-- Steven Smart
kubeadm
kubectl
kubernetes

1 Answer

6/16/2019

Problem was that swap was turned on the worker nodes and thus kublet crashed exited. This was evident from syslog file under /var, thus the taint will get re-added until this is resolved. Perhaps someone can comment on the implications of allowing kublet to run with swap on?:

kubelet[29207]: F0616 06:25:05.597536   29207 server.go:265] failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename#011#011#011#011Type#011#011Size#011Used#011Priority /dev/xvda5                              partition#0114191228#0110#011-1]
Jun 16 06:25:05 k8s-node2 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jun 16 06:25:05 k8s-node2 systemd[1]: kubelet.service: Unit entered failed state.
Jun 16 06:25:05 k8s-node2 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jun 16 06:25:15 k8s-node2 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jun 16 06:25:15 k8s-node2 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jun 16 06:25:15 k8s-node2 systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Steven Smart
Source: StackOverflow