Node affinity ignored for a node

7/26/2019

I have 3 nodes, each of them labeled like following:

  • node-0 -> mongo-volume=volume-0
  • node-1 -> mongo-volume=volume-1
  • node-2 -> mongo-volume=volume-2

I'm looking for a way to schedule the replicas of a statefulset on a special node.

I first used the hard way with requiredDuringSchedulingIgnoredDuringExecution and everything works well.

Then I wanted to test the soft way by using preferredDuringSchedulingIgnoredDuringExecution.

I first tell to my statefulset to have a preference for the node having the label volume-0, no problems the pods were all deployed on the node-0.

Then I changed the preference for the node having the label volume-1. And there is my problem, the pods were deployed on the node-0 and node-2 but node on the node-1.

I did the same with the label volume-2 and it works well again, the pods were all deployed on the node-2.

Node Affinity configuration:

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      preference:
        matchExpressions:
        - key: mongo-volume
          operator: In
          values:
          - volume-1  

When I looked for the resource usage of the nodes, I noticed that the node-1 had a bit more load than the others. Could it explain why the sheduler refuses to deploy the pods on this node ?

NAME    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
node-0   63m          6%     795Mi           41%
node-1   116m         11%    978Mi           51%
node-2   78m          7%     752Mi           39%

I'm wondering why it works for the node-0 and the node-2 but not for the node-1. And if there is a possible way to fix it.

-- LaurentP22
kubectl
kubernetes
kubernetes-statefulset

1 Answer

8/9/2019

Affinity policy is prefer to run on this node, not select directly the node.

The weight of your affinity is the priority of your affinity policies. For example:

podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
  - weight: 100
    podAffinityTerm:
      labelSelector:
        matchExpressions:
        - key: k1
          operator: In
          values:
          - "v1"
      topologyKey: kubernetes.io/hostname
  - weight: 30
    podAffinityTerm:
      labelSelector:
        matchExpressions:
        - key: k2
          operator: In
          values:
          - "v2"
      topologyKey: kubernetes.io/hostname

K8s scheduler doc says:

kube-scheduler selects a node for the pod in a 2-step operation:

  1. Filtering

  2. Scoring

The filtering step finds the set of Nodes where it’s feasible to schedule the Pod. For example, the PodFitsResources filter checks whether a candidate Node has enough available resource to meet a Pod’s specific resource requests. After this step, the node list contains any suitable Nodes; often, there will be more than one. If the list is empty, that Pod isn’t (yet) schedulable.

In the scoring step, the scheduler ranks the remaining nodes to choose the most suitable Pod placement. The scheduler assigns a score to each Node that survived filtering, basing this score on the active scoring rules.

Finally, kube-scheduler assigns the Pod to the Node with the highest ranking. If there is more than one node with equal scores, kube-scheduler selects one of these at random

Affinity is part of the consideration but not all.

-- Ray
Source: StackOverflow