Schedule few statefulset pods on one node and rest on other node in a kubernetes cluster

10/9/2020

I have a kubernetes cluster of 3 worker nodes where I need to deploy a statefulset app having 6 replicas. My requirement is to make sure in every case, each node should get exactly 2 pods out of 6 replicas. Basically,

node1 - 2 pods of app
node2 - 2 pods of app
node3 - 2 pods of app
========================
Total   6 pods of app        

Any help would be appreciated!

-- Nish
kubernetes
kubernetes-helm
kubernetes-statefulset

1 Answer

10/9/2020

You should use Pod Anti-Affinity to make sure that the pods are spread to different nodes.

Since you will have more than one pod on the nodes, use preferredDuringSchedulingIgnoredDuringExecution

example when the app has the label app: mydb (use what fits your case):

    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - mydb
          topologyKey: "kubernetes.io/hostname"

each node should get exactly 2 pods out of 6 replicas

Try to not think that the pods are pinned to certain node. The idea with Kubernetes workload is that the workload is independent of the underlying infrastructure such as nodes. What you really want - I assume - is to spread the pods to increase availability - e.g. if one nodes goes down, your system should still be available.

If you are running at a cloud provider, you should probably design the anti-affinity such that the pods are scheduled to different Availability Zones and not only to different Nodes - but it requires that your cluster is deployed in a Region (consisting of multiple Availability Zones).

Spread pods across Availability Zones

After even distribution, all 3 nodes (scattered over three zones ) will have 2 pods. That is ok. The hard requirement is if 1 node ( Say node-1) goes down, then it's 2 pods, need not be re-scheduled again on other nodes. When the node-1 is restored, then those 2 pods now will be scheduled back on it. So, we can say, all 3 pair of pods have different node/zone affinity. Any idea around this?

This can be done with PodAffinity, but is more likely done using TopologySpreadConstraints and you will probably use topologyKey: topology.kubernetes.io/zone but this depends on what labels your nodes have.

-- Jonas
Source: StackOverflow