How to force Pods/Deployments to Master nodes?

2/2/2017

I've setup a Kubernetes 1.5 cluster with the three master nodes tainted dedicated=master:NoSchedule. Now I want to deploy the Nginx Ingress Controller on the Master nodes only so I've added tolerations:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 3
  template:
    metadata:
      labels:
        k8s-app: nginx-ingress-lb
        name: nginx-ingress-lb
      annotations:
        scheduler.alpha.kubernetes.io/tolerations: |
          [
            {
              "key": "dedicated",
              "operator": "Equal",
              "value": "master",
              "effect": "NoSchedule"
            }
          ]
    spec:
    […]

Unfortunately this does not have the desired effect: Kubernetes schedules all Pods on the workers. When scaling the number of replicas to a larger number the Pods are deployed on the workers, too.

How can I achieve scheduling to the Master nodes only?

Thanks for your help.

-- Stephan
kubernetes

3 Answers

4/11/2020
  tolerations:
  - key: node-role.kubernetes.io/master
    effect: NoSchedule
-- Boeboe
Source: StackOverflow

2/2/2017

A toleration does not mean that the pod must be scheduled on a node with such taints. It means that the pod tolerates such a taint. If you want your pod to be "attracted" to specific nodes you will need to attach a label to your dedicated=master tainted nodes and set nodeSelector in the pod to look for such label.

Attach the label to each of your special use nodes:

kubectl label nodes name_of_your_node dedicated=master

Kubernetes 1.6 and above syntax

Add the nodeSelector to your pod:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 3
  template:
    metadata:
      labels:
        k8s-app: nginx-ingress-lb
        name: nginx-ingress-lb
      annotations:
    spec:
      nodeSelector:
        dedicated: master
      tolerations:
      - key: dedicated
        operator: Equal
        value: master
        effect: NoSchedule
    […]

If you don't fancy nodeSelector you can add affinity: under spec: instead:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        matchExpressions:
        - key: dedicated
          operator: Equal
          values: ["master"]

Pre 1.6 syntax

Add the nodeSelector to your pod:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 3
  template:
    metadata:
      labels:
        k8s-app: nginx-ingress-lb
        name: nginx-ingress-lb
      annotations:
        scheduler.alpha.kubernetes.io/tolerations: |
          [
            {
              "key": "dedicated",
              "operator": "Equal",
              "value": "master",
              "effect": "NoSchedule"
            }
          ]
    spec:
      nodeSelector:
        dedicated: master
    […]

If you don't fancy nodeSelector you can also add an annotation like this:

scheduler.alpha.kubernetes.io/affinity: >
  {
    "nodeAffinity": {
      "requiredDuringSchedulingIgnoredDuringExecution": {
        "nodeSelectorTerms": [
          {
            "matchExpressions": [
              {
                "key": "dedicated",
                "operator": "Equal",
                "values": ["master"]
              }
            ]
          }
        ]
      }
    }
  }

Keep in mind that NoSchedule will not evict pods that are already scheduled.

The information above is from https://kubernetes.io/docs/user-guide/node-selection/ and there are more details there.

-- Janos Lenart
Source: StackOverflow

2/2/2017

you might want to dive into the Assigning Pods to Nodes documentation. Basically you should add some labels to your nodes with sth like this:

kubectl label nodes <node-name> <label-key>=<label-value>

and then reference that within your Pod specification like this:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    label: value

But I'm not sure if this works for non-critical addons when the specific node is tainted. More details could be found here

-- pagid
Source: StackOverflow