I have 3 nodes, each of them labeled like following:
I'm looking for a way to schedule the replicas of a statefulset on a special node.
I first used the hard way with requiredDuringSchedulingIgnoredDuringExecution and everything works well.
Then I wanted to test the soft way by using preferredDuringSchedulingIgnoredDuringExecution.
I first tell to my statefulset to have a preference for the node having the label volume-0, no problems the pods were all deployed on the node-0.
Then I changed the preference for the node having the label volume-1. And there is my problem, the pods were deployed on the node-0 and node-2 but node on the node-1.
I did the same with the label volume-2 and it works well again, the pods were all deployed on the node-2.
Node Affinity configuration:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: mongo-volume
operator: In
values:
- volume-1
When I looked for the resource usage of the nodes, I noticed that the node-1 had a bit more load than the others. Could it explain why the sheduler refuses to deploy the pods on this node ?
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node-0 63m 6% 795Mi 41%
node-1 116m 11% 978Mi 51%
node-2 78m 7% 752Mi 39%
I'm wondering why it works for the node-0 and the node-2 but not for the node-1. And if there is a possible way to fix it.
Affinity policy is prefer to run on this node, not select directly the node.
The weight of your affinity is the priority of your affinity policies. For example:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: k1
operator: In
values:
- "v1"
topologyKey: kubernetes.io/hostname
- weight: 30
podAffinityTerm:
labelSelector:
matchExpressions:
- key: k2
operator: In
values:
- "v2"
topologyKey: kubernetes.io/hostname
K8s scheduler doc says:
kube-scheduler selects a node for the pod in a 2-step operation:
Filtering
Scoring
The filtering step finds the set of Nodes where it’s feasible to schedule the Pod. For example, the PodFitsResources filter checks whether a candidate Node has enough available resource to meet a Pod’s specific resource requests. After this step, the node list contains any suitable Nodes; often, there will be more than one. If the list is empty, that Pod isn’t (yet) schedulable.
In the scoring step, the scheduler ranks the remaining nodes to choose the most suitable Pod placement. The scheduler assigns a score to each Node that survived filtering, basing this score on the active scoring rules.
Finally, kube-scheduler assigns the Pod to the Node with the highest ranking. If there is more than one node with equal scores, kube-scheduler selects one of these at random
Affinity is part of the consideration but not all.