Why The Pod Created By This Manifest Could be Deployed to GPU Worker without Specifying Nodeselector

2/27/2020

I have created a pod by kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml

However, I notice that there is no nodeSelector. Then how could the pod be correctly deployed to target gpu machines?? Why it chose to skip the master machine? AFAK, the daemonset makes its pod to be deployed on every node, not just parts of the cluster without specifying any nodeselector.

Parts of the manifest:

QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
             node.kubernetes.io/disk-pressure:NoSchedule
             node.kubernetes.io/memory-pressure:NoSchedule
             node.kubernetes.io/not-ready:NoExecute
             node.kubernetes.io/pid-pressure:NoSchedule
             node.kubernetes.io/unreachable:NoExecute
             node.kubernetes.io/unschedulable:NoSchedule

Events:

Cluster Information:
2 machines, one as master which has only one CPU and the other as worker which has both cpu and gpu in it.

kubernetes: 1.15

-- Steve
daemonset
kubernetes
nvidia
pod

2 Answers

2/28/2020

Check the taints on all your nodes by running:

kubectl describe node <node-name> | grep -i taint

Master node by default has a taint which prevents regular Pods from being scheduled on that particular node. Note that master node is intended to serve exclusively for running kubernetes control plane components which live in kube-system namespace. Such solution ensures stability of kubernetes cluster and is totally justified. Generally deploying any additional production workload on master node is a bad idea. When it comes to non-production solutions such as Minikube when you have single node cluster it is totally acceptable. You may want to familiarize with taints and tollerations to understand better this concept.

Apart from adding tollerations to Pod specification you may consider removing taints from your nodes (described also here):

Removing a taint from a node You can use kubectl taint to remove taints. You can remove taints by key, key-value, or key-effect.

For example, the following command removes from node foo all the taints with key dedicated:

kubectl taint nodes foo dedicated-

Important: taints are not labels! as some comments may suggest. They may be related with labels but they are not labels themselves.

You can easily check it by running:

kubectl get node <tainted-node> -o yaml

and you'll see all currently applied taints in node's spec section:

spec:
  ...
  taints:
  - effect: NoSchedule
    key: key
    value: some-value

Typically on master node you have the following taint:

spec:
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master

and among many other labels there is specific one related with this taint:

node-role.kubernetes.io/master: ""

Answering your specific question that you posted in the title:

Why The Pod Created By This Manifest Could be Deployed to GPU Worker without Specifying Nodeselector

I recommend you to read more about different mechanisms used by kubernetes for assigning Pods to Nodes in this article and you'll see that nodeSelector can be used if you want to guarantee that your Pod would be scheduled on node with specific labels. However absence of a nodeSelector doesn't prevent your Pods to be scheduled on such node. Pods can still be scheduled on every node that doesn't have any taints (preventing this way certain Pods that don't have specific toleration to be scheduled on it) or on any nodes that have some taints, however scheduled Pod can tolerate those taints.

In your case worker node is the only node that doesn't have on it taints, preventing Pods which don't tolerate this specific taint to be scheduled on it. Since worker node doesn't have any taints, the Pod is scheduled on it.

If you want to prevent any other Pod e.g. which don't require GPU from being scheduled on that specific node, you can create your own taint and only Pods that will have specific toleration, can be scheduled on such node.

I hope it cleared your doubts.

-- mario
Source: StackOverflow

2/27/2020

This is a DaemonSet which means it will get deployed into all worker nodes.

From the docs

If you specify a .spec.template.spec.nodeSelector, then the DaemonSet controller will create Pods on nodes which match that node selector. Likewise if you specify a .spec.template.spec.affinity, then DaemonSet controller will create Pods on nodes which match that node affinity. If you do not specify either, then the DaemonSet controller will create Pods on all nodes

Since Kubernetes 1.6, DaemonSets do not schedule on master nodes by default.If you add below toleration it will get deployed into all master nodes as well as all worker nodes.

tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
-- Arghya Sadhu
Source: StackOverflow