I have a kubernetes cluster of 3 worker nodes where I need to deploy a statefulset
app having 6 replicas.
My requirement is to make sure in every case, each node should get exactly 2 pods out of 6 replicas. Basically,
node1 - 2 pods of app
node2 - 2 pods of app
node3 - 2 pods of app
========================
Total 6 pods of app
Any help would be appreciated!
You should use Pod Anti-Affinity to make sure that the pods are spread to different nodes.
Since you will have more than one pod on the nodes, use preferredDuringSchedulingIgnoredDuringExecution
example when the app has the label app: mydb
(use what fits your case):
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mydb
topologyKey: "kubernetes.io/hostname"
each node should get exactly 2 pods out of 6 replicas
Try to not think that the pods are pinned to certain node. The idea with Kubernetes workload is that the workload is independent of the underlying infrastructure such as nodes. What you really want - I assume - is to spread the pods to increase availability - e.g. if one nodes goes down, your system should still be available.
If you are running at a cloud provider, you should probably design the anti-affinity such that the pods are scheduled to different Availability Zones and not only to different Nodes - but it requires that your cluster is deployed in a Region (consisting of multiple Availability Zones).
After even distribution, all 3 nodes (scattered over three zones ) will have 2 pods. That is ok. The hard requirement is if 1 node ( Say node-1) goes down, then it's 2 pods, need not be re-scheduled again on other nodes. When the node-1 is restored, then those 2 pods now will be scheduled back on it. So, we can say, all 3 pair of pods have different node/zone affinity. Any idea around this?
This can be done with PodAffinity
, but is more likely done using TopologySpreadConstraints and you will probably use topologyKey: topology.kubernetes.io/zone
but this depends on what labels your nodes have.