Why kubernetes scheduler ignores nodeAffinity?

7/15/2019

I have a kubernetes cluster version 1.12 deployed to aws with kops

The cluster has several nodes marked with a label 'example.com/myLabel' that takes the values a, b, c, d

For example:

Node name example.com/myLabel instance1 a instance2 b instance3 c instance4 d

And there is a test deployment

apiVersion: apps/v1 kind: Deployment metadata: name: test-scheduler spec: replicas: 6 selector: matchLabels: app: test-scheduler template: metadata: labels: app: test-scheduler spec: tolerations: - key: spot operator: Exists affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: example.com/myLabel operator: In values: - a weight: 40 - preference: matchExpressions: - key: example.com/myLabel operator: In values: - b weight: 35 - preference: matchExpressions: - key: example.com/myLabel operator: In values: - c weight: 30 - preference: matchExpressions: - key: example.com/myLabel operator: In values: - d weight: 25 containers: - name: a resources: requests: cpu: "100m" memory: "50Mi" limits: cpu: "100m" memory: "50Mi" image: busybox command: - 'sleep' - '99999'

According to the documentation, nodeAffinity must exist for each node that can be used for a scheduled pod and the node having the biggest weight sum is chosen.

I expect all pods to be scheduled to node instance1 with label ‘a’, but in my case, the nodes are chosen randomly.

For example, here are the 5 nodes planned for 6 pods from the deployment, including another1 and another2 nodes, which do not contain my label at all (there is another node with this label with the value 'd'):

NODE LABEL another1 NONE node1 a node2 b node3 c another2 NONE

All nodes have capacity, they are available and can run pods

I have 2 questions

  1. Why does this happen?

  2. Where does the k8s scheduler log information on how a node is assigned for a pod? Events do not contain this information and scheduler logs on masters are empty

UPDATE:

My nodes contains correctly labels

example.com/myLabel=a
example.com/myLabel=b
example.com/myLabel=c
example.com/myLabel=d
-- Pavel Kurdikov
kops
kubernetes
kubernetes-pod

2 Answers

7/15/2019

If you put on your nodes a label with only the value it won't work, you have to put a label on each node with the key=value of your label, for example from one of my clusters on GCP I obtine this with executing kubectl describe on one node:

Labels:         beta.kubernetes.io/arch=amd64
                beta.kubernetes.io/fluentd-ds-ready=true
                beta.kubernetes.io/instance-type=n1-standard-2
                beta.kubernetes.io/os=linux

You have to put your labels correctly by as:

example.com/myLabel=a

With that, your nodes are correctly classified

-- wolmi
Source: StackOverflow

7/18/2019

preferredDuringSchedulingIgnoredDuringExecution just means that the scheduler will add the weight you set to the algorithm it uses to choose which node to schedule to. This is not a hard rule but a preferred rule.

With the weights you set, you will get a somewhat even spread. You would need to have a very large sample size before you would start to see the spread you are aiming for.

Keep in mind that the "weight" is not just taken by the affinity you set, other factors of the nodes have their own weight as well. If you want to see the effect more clearly, use a greater weight difference between each affinity

-- Patrick W
Source: StackOverflow