How does weight affect pod scheduling when affinity rules are set?

1/15/2020

Background:

While performance testing an application, I was getting inconsistent results when scaling the replicas for my php-fpm containers where I realized that 3/4 pods were scheduled on the same node.

I then configured anti affinity rules to not schedule pods on the same node. I quickly realized that using requiredDuringSchedulingIgnoredDuringExecution was not an option because I could not have # of replicas > # of nodes so I configured preferredDuringSchedulingIgnoredDuringExecution.

For the most part, it looks like my pods are scheduled evenly across all my nodes however sometimes (seen through a rolling upgrade), I see pods on the same node. I feel like the weight value which is currently set to 100 is playing a factor.

Here is the yaml I am using (helm):

      {{- if .Values.podAntiAffinity }}
      {{- if .Values.podAntiAffinity.enabled }}
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: "{{ .Values.deploymentName }}"
              topologyKey: "kubernetes.io/hostname"
      {{- end }}
      {{- end }}

Questions:

The way I read the documentation, the weight number will be added to a calculated score for the node based on how busy it is (simplified) however what I don't understand is how a weight of 1 vs 100 would be any different?

Why are pods sometimes scheduled on the same node with this rule? Is it because the total score for the node that the pod wasn't scheduled on is too low (as it is too busy)?

Is there a way to see a log/event of how the pod was scheduled on a particular node? I'd expect kubectl describe pod to have those details but seemingly it does not (except in an error scenario).

-- leeman24
kubernetes

1 Answer

1/16/2020

preferredDuringSchedulingIgnoredDuringExecution is not guaranteed.

two types of node affinity, called requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. You can think of them as “hard” and “soft” respectively, in the sense that the former specifies rules that must be met for a pod to be scheduled onto a node (just like nodeSelector but using a more expressive syntax), while the latter specifies preferences that the scheduler will try to enforce but will not guarantee.

The weight you set is giving an edge but there are other parameters (set by user and kubernetes) with their own weights. Below example should give a better picture where weight that you set matters

 affinity:
   nodeAffinity:
     preferredDuringSchedulingIgnoredDuringExecution:
     - preference:
         matchExpressions:
         - key: example.com/myLabel
           operator: In
           values:
           - a
       weight: 40
     - preference:
         matchExpressions:
         - key: example.com/myLabel
           operator: In
           values:
           - b
       weight: 35
-- ffran09
Source: StackOverflow