I have the following anti-affinity rule configured in my k8s Deployment:
spec:
...
selector:
matchLabels:
app: my-app
environment: qa
...
template:
metadata:
labels:
app: my-app
environment: qa
version: v0
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- my-app
topologyKey: kubernetes.io/hostname
In which I say that I do not want any of the Pod replica to be scheduled on a node of my k8s cluster in which is already present a Pod of the same application. So, for instance, having:
nodes(a,b,c) = 3
replicas(1,2,3) = 3
replica_1 scheduled in node_a, replica_2 scheduled in node_b and replica_3 scheduled in node_c
As such, I have each Pod scheduled in different nodes.
However, I was wondering if there is a way to specify that: "I want to spread my Pods in at least 2 nodes" to guarantee high availability without spreading all the Pods to other nodes, for example:
nodes(a,b,c) = 3
replicas(1,2,3) = 3
replica_1 scheduled in node_a, replica_2 scheduled in node_b and replica_3 scheduled (again) in node_a
So, to sum up, I would like to have a softer constraint, that allow me to guarantee high availability spreading Deployment's replicas across at least 2 nodes, without having to launch a node for each Pod of a certain application.
Thanks!
I think I found a solution to your problem. Look at this example yaml file:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
example: app
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-1
- worker-2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 50
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-1
Idea of this configuration: I'm using nodeAffinity here to indicate on which nodes pod can be placed:
- key: kubernetes.io/hostname
and
values:
- worker-1
- worker-2
It is important to set the following line:
- maxSkew: 1
According to the documentation:
maxSkew describes the degree to which Pods may be unevenly distributed. It must be greater than zero.
Thanks to this, the difference in the number of assigned feeds between nodes will always be maximally equal to 1.
This section:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 50
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-1
is optional however, it will allow you to adjust the feed distribution on the free nodes even better. Here you can find a description with differences between: requiredDuringSchedulingIgnoredDuringExecution
and preferredDuringSchedulingIgnoredDuringExecution
:
Thus an example of
requiredDuringSchedulingIgnoredDuringExecution
would be "only run the pod on nodes with Intel CPUs" and an examplepreferredDuringSchedulingIgnoredDuringExecution
would be "try to run this set of pods in failure zone XYZ, but if it's not possible, then allow some to run elsewhere".