Kubernetes DaemonSets in the presence of tolerations

7/12/2019

I am thinking about paritioning my Kubernetes cluster into zones of dedicated nodes for exclusive use by dedicated sets of users as discussed here. I am wondering how tainting nodes would affect DaemonSets, including those that are vital to cluster operation (e.g. kube-proxy, kube-flannel-ds-amd64)?

The documentation says daemon pods respect taints and tolerations. But if so, how can the system schedule e.g. kube-proxy pods on nodes tainted with kubectl taint nodes node-x zone=zone-y:NoSchedule when the pod (which is not under my control but owned by Kubernetes' own DaemonSet kube-proxy) does not carry a corresponding toleration.

What I have found empirically so far is that Kubernetes 1.14 reschedules a kube-proxy pod regardless (after I have deleted it on the tainted node-x), which seems to contradict the documentation. One the other hand, this does not seem to be the case for my own DaemonSet. When I kill its pod on node-x it only gets rescheduled after I remove the node's taint (or presumably after I add a toleration to the pod's spec inside the DaemonSet).

So how do DaemonSets and tolerations interoperate in detail. Could it be that certain DaemonSets (such as kube-proxy, kube-flannel-ds-amd64) are treated specially?

-- rookie099
daemonset
kubernetes

1 Answer

7/12/2019

Your kube-proxy and flannel daemonsets will have many tolerations defined in their manifest that mean they will get scheduled even on tainted nodes.

Here are a couple from my canal daemonset:

tolerations:
  - effect: NoSchedule
    operator: Exists
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoExecute
    operator: Exists

Here are the taints from one of my master nodes:

taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/controlplane
    value: "true"
  - effect: NoExecute
    key: node-role.kubernetes.io/etcd
    value: "true"

Even though most workloads won't be scheduled on the master because of its NoSchedule and NoExectue taints, a canal pod will be run there because the daemonset tolerates those taints specifically.

The doc you already linked to goes into detail.

-- switchboard.op
Source: StackOverflow