Background:
While performance testing an application, I was getting inconsistent results when scaling the replicas for my php-fpm
containers where I realized that 3/4 pods were scheduled on the same node.
I then configured anti affinity rules to not schedule pods on the same node. I quickly realized that using requiredDuringSchedulingIgnoredDuringExecution
was not an option because I could not have # of replicas > # of nodes so I configured preferredDuringSchedulingIgnoredDuringExecution
.
For the most part, it looks like my pods are scheduled evenly across all my nodes however sometimes (seen through a rolling upgrade), I see pods on the same node. I feel like the weight
value which is currently set to 100 is playing a factor.
Here is the yaml I am using (helm):
{{- if .Values.podAntiAffinity }}
{{- if .Values.podAntiAffinity.enabled }}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: "{{ .Values.deploymentName }}"
topologyKey: "kubernetes.io/hostname"
{{- end }}
{{- end }}
Questions:
The way I read the documentation, the weight
number will be added to a calculated score for the node based on how busy it is (simplified) however what I don't understand is how a weight of 1 vs 100 would be any different?
Why are pods sometimes scheduled on the same node with this rule? Is it because the total score for the node that the pod wasn't scheduled on is too low (as it is too busy)?
Is there a way to see a log/event of how the pod was scheduled on a particular node? I'd expect kubectl describe pod
to have those details but seemingly it does not (except in an error scenario).
preferredDuringSchedulingIgnoredDuringExecution is not guaranteed.
two types of node affinity, called requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. You can think of them as “hard” and “soft” respectively, in the sense that the former specifies rules that must be met for a pod to be scheduled onto a node (just like nodeSelector but using a more expressive syntax), while the latter specifies preferences that the scheduler will try to enforce but will not guarantee.
The weight you set is giving an edge but there are other parameters (set by user and kubernetes) with their own weights. Below example should give a better picture where weight that you set matters
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: example.com/myLabel
operator: In
values:
- a
weight: 40
- preference:
matchExpressions:
- key: example.com/myLabel
operator: In
values:
- b
weight: 35