I was able to remove the Taint from master but my two worker nodes installed bare metal with Kubeadmin keep the unreachable taint even after issuing command to remove them. It says removed but its not permanent. And when I check taints still there. I also tried patching and setting to null but this did not work. Only thing I found on SO or anywhere else deals with master or assumes these commands work.
UPDATE: I checked the timestamp of the Taint and its added in again the moment it is deleted. So in what sense is the node unreachable? I can ping it. Is there any kubernetes diagnostics I can run to find out how it is unreachable? I checked I can ping both ways between master and worker nodes. So where would log would show error which component cannot connect?
kubectl describe no k8s-node1 | grep -i taint
Taints: node.kubernetes.io/unreachable:NoSchedule
Tried:
kubectl patch node k8s-node1 -p '{"spec":{"Taints":[]}}'
And
kubectl taint nodes --all node.kubernetes.io/unreachable:NoSchedule-
kubectl taint nodes --all node.kubernetes.io/unreachable:NoSchedule-
node/k8s-node1 untainted
node/k8s-node2 untainted
error: taint "node.kubernetes.io/unreachable:NoSchedule" not found
result is it says untainted for the two workers nodes but then I see them again when I grep
kubectl describe no k8s-node1 | grep -i taint
Taints: node.kubernetes.io/unreachable:NoSchedule
$ k get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 10d v1.14.2
k8s-node1 NotReady <none> 10d v1.14.2
k8s-node2 NotReady <none> 10d v1.14.2
UPDATE: Found someone had same problem and could only fix by resetting the cluster with Kubeadmin
https://forum.linuxfoundation.org/discussion/846483/lab2-1-kubectl-untainted-not-working
Sure hope I dont have to do that every time the worker nodes get tainted.
k describe node k8s-node2
Name: k8s-node2
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k8s-node2
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":”d2:xx:61:c3:xx:16"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.xx.1.xx
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 05 Jun 2019 11:46:12 +0700
Taints: node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure Unknown Fri, 14 Jun 2019 10:34:07 +0700 Fri, 14 Jun 2019 10:35:09 +0700 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Fri, 14 Jun 2019 10:34:07 +0700 Fri, 14 Jun 2019 10:35:09 +0700 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Fri, 14 Jun 2019 10:34:07 +0700 Fri, 14 Jun 2019 10:35:09 +0700 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Fri, 14 Jun 2019 10:34:07 +0700 Fri, 14 Jun 2019 10:35:09 +0700 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 10.10.10.xx
Hostname: k8s-node2
Capacity:
cpu: 2
ephemeral-storage: 26704124Ki
memory: 4096032Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 24610520638
memory: 3993632Ki
pods: 110
System Info:
Machine ID: 6e4e4e32972b3b2f27f021dadc61d21
System UUID: 6e4e4ds972b3b2f27f0cdascf61d21
Boot ID: abfa0780-3b0d-sda9-a664-df900627be14
Kernel Version: 4.4.0-87-generic
OS Image: Ubuntu 16.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.3
Kubelet Version: v1.14.2
Kube-Proxy Version: v1.14.2
PodCIDR: 10.xxx.10.1/24
Non-terminated Pods: (18 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
heptio-sonobuoy sonobuoy-systemd-logs-daemon-set- 6a8d92061c324451-hnnp9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d1h
istio-system istio-pilot-7955cdff46-w648c 110m (5%) 2100m (105%) 228Mi (5%) 1224Mi (31%) 6h55m
istio-system istio-telemetry-5c9cb76c56-twzf5 150m (7%) 2100m (105%) 228Mi (5%) 1124Mi (28%) 6h55m
istio-system zipkin-8594bbfc6b-9p2qc 0 (0%) 0 (0%) 1000Mi (25%) 1000Mi (25%) 6h55m
knative-eventing webhook-576479cc56-wvpt6 0 (0%) 0 (0%) 1000Mi (25%) 1000Mi (25%) 6h45m
knative-monitoring elasticsearch-logging-0 100m (5%) 1 (50%) 0 (0%) 0 (0%) 3d20h
knative-monitoring grafana-5cdc94dbd-mc4jn 100m (5%) 200m (10%) 100Mi (2%) 200Mi (5%) 3d21h
knative-monitoring kibana-logging-7cb6b64bff-dh8nx 100m (5%) 1 (50%) 0 (0%) 0 (0%) 3d20h
knative-monitoring kube-state-metrics-56f68467c9-vr5cx 223m (11%) 243m (12%) 176Mi (4%) 216Mi (5%) 3d21h
knative-monitoring node-exporter-7jw59 110m (5%) 220m (11%) 50Mi (1%) 90Mi (2%) 3d22h
knative-monitoring prometheus-system-0 0 (0%) 0 (0%) 400Mi (10%) 1000Mi (25%) 3d20h
knative-serving activator-6cfb97bccf-bfc4w 120m (6%) 2200m (110%) 188Mi (4%) 1624Mi (41%) 6h45m
knative-serving autoscaler-85749b6c48-4wf6z 130m (6%) 2300m (114%) 168Mi (4%) 1424Mi (36%) 6h45m
knative-serving controller-b49d69f4d-7j27s 100m (5%) 1 (50%) 100Mi (2%) 1000Mi (25%) 6h45m
knative-serving networking-certmanager-5b5d8f5dd8-qjh5q 100m (5%) 1 (50%) 100Mi (2%) 1000Mi (25%) 6h45m
knative-serving networking-istio-7977b9bbdd-vrpl5 100m (5%) 1 (50%) 100Mi (2%) 1000Mi (25%) 6h45m
kube-system canal-qbn67 250m (12%) 0 (0%) 0 (0%) 0 (0%) 10d
kube-system kube-proxy-phbf5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 10d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1693m (84%) 14363m (718%)
memory 3838Mi (98%) 11902Mi (305%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
Problem was that swap was turned on the worker nodes and thus kublet crashed exited. This was evident from syslog file under /var, thus the taint will get re-added until this is resolved. Perhaps someone can comment on the implications of allowing kublet to run with swap on?:
kubelet[29207]: F0616 06:25:05.597536 29207 server.go:265] failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename#011#011#011#011Type#011#011Size#011Used#011Priority /dev/xvda5 partition#0114191228#0110#011-1]
Jun 16 06:25:05 k8s-node2 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jun 16 06:25:05 k8s-node2 systemd[1]: kubelet.service: Unit entered failed state.
Jun 16 06:25:05 k8s-node2 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jun 16 06:25:15 k8s-node2 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jun 16 06:25:15 k8s-node2 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jun 16 06:25:15 k8s-node2 systemd[1]: Started kubelet: The Kubernetes Node Agent.