While following the kubernetes article on Using kubeadm to Create a Cluster, I was stuck when the AddOn pods I was trying to install (Nginx, Tiller, Grafana, InfluxDB, Dashboard) would always stay in a state of Pending.
Checking the message from kubectl describe pod tiller-deploy-df4fdf55d-jwtcz --namespace=kube-system
resulted in the following message:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 51s (x15 over 3m) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
When I ran the command from the Master Isolation section kubectl taint nodes --all node-role.kubernetes.io/master-
, the AddOns would install as expected.
At this point I can only suspect (because they are already installed on the master node) that the reason was that I hadn't connected a worker node to the cluster yet for the scheduler to schedule the pods on.
The documentation states "your cluster will not schedule pods on the master for security reasons". I know that this is a non-production environment so there is little risk in this situation but what is the risk of removing that taint in a production cluster?
Follow-up: If this is a risk, how can I re-add that taint so I can then uninstall the AddOn pods and try to have the scheduler install them on my Worker Node?
Environment Details: Operating System - CentOS 7.4.1708 (Core) Kubernetes Version - 1.10
the reason was that I hadn't connected a worker node to the cluster yet for the scheduler to schedule the pods on.
100% correct. You will for sure want some worker nodes, otherwise the idea of "scheduling work" becomes very weird.
but what is the risk of removing that taint in a production cluster?
I am not a kubernetes security expert, but a pragmatic risk is CPU, I/O, and/or memory exhaustion on the master nodes, which would have very severe consequences to the health of the cluster. There is almost never a reason to run any workload on a master node, and almost entirely an increase in risk, so the advice "just don't do it" is well founded.
how can I re-add that taint so I can then uninstall the AddOn pods and try to have the scheduler install them on my Worker Node?
I'm not sure I follow that question, but I would for sure start by just adding a worker node before trying to do complicated stuff with taints and tolerations.