Kubernetes anti-affinity rule to spread Deployment Pods to at least 2 nodes

7/19/2021

I have the following anti-affinity rule configured in my k8s Deployment:

spec:
  ...
  selector:
    matchLabels:
      app: my-app
      environment: qa
  ...
  template:
    metadata:
      labels:
        app: my-app
        environment: qa
        version: v0
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - my-app
            topologyKey: kubernetes.io/hostname

In which I say that I do not want any of the Pod replica to be scheduled on a node of my k8s cluster in which is already present a Pod of the same application. So, for instance, having:

nodes(a,b,c) = 3
replicas(1,2,3) = 3

replica_1 scheduled in node_a, replica_2 scheduled in node_b and replica_3 scheduled in node_c

As such, I have each Pod scheduled in different nodes.

However, I was wondering if there is a way to specify that: "I want to spread my Pods in at least 2 nodes" to guarantee high availability without spreading all the Pods to other nodes, for example:

nodes(a,b,c) = 3
replicas(1,2,3) = 3

replica_1 scheduled in node_a, replica_2 scheduled in node_b and replica_3 scheduled (again) in node_a

So, to sum up, I would like to have a softer constraint, that allow me to guarantee high availability spreading Deployment's replicas across at least 2 nodes, without having to launch a node for each Pod of a certain application.

Thanks!

-- Luca Tartarini
affinity
kubernetes
kubernetes-pod
scheduler
scheduling

1 Answer

7/26/2021

I think I found a solution to your problem. Look at this example yaml file:

spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
    matchLabels:
      example: app
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-1
            - worker-2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 50
        preference:
          matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-1

Idea of this configuration: I'm using nodeAffinity here to indicate on which nodes pod can be placed:

- key: kubernetes.io/hostname

and

values:
- worker-1
- worker-2

It is important to set the following line:

- maxSkew: 1

According to the documentation:

maxSkew describes the degree to which Pods may be unevenly distributed. It must be greater than zero.

Thanks to this, the difference in the number of assigned feeds between nodes will always be maximally equal to 1.

This section:

      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 50
        preference:
          matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-1

is optional however, it will allow you to adjust the feed distribution on the free nodes even better. Here you can find a description with differences between: requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution:

Thus an example of requiredDuringSchedulingIgnoredDuringExecution would be "only run the pod on nodes with Intel CPUs" and an example preferredDuringSchedulingIgnoredDuringExecution would be "try to run this set of pods in failure zone XYZ, but if it's not possible, then allow some to run elsewhere".

-- Mikołaj Głodziak
Source: StackOverflow