I am working on migrating my applications to Kubernetes. I am using EKS.
I want to distribute my pods to different nodes, to avoid having a single point of failure. I read about pod-affinity
and anti-affinity
and required
and preferred
mode.
This answer gives a very nice way to accomplish this.
But my doubt is, let's say if I have 3 nodes, of which 2 are already full(resource-wise). If I use requiredDuringSchedulingIgnoredDuringExecution
, k8s will spin-up new nodes and will distribute the pods to each node. And if I use preferredDuringSchedulingIgnoredDuringExecution
, it will check for preferred-nodes, and not finding different nodes, will deploy all pods on the third node only. In which case, it will again become a single point of failure.
How do I solve this condition?
One way I can think of is to have an over-provisioned cluster, so that there are always some extra nodes.
Second way, I am not sure how to do this, but I think there should be a way of using both requiredDuringSchedulingIgnoredDuringExecution
and preferredDuringSchedulingIgnoredDuringExecution
.
Can anyone help me with this? Am I missing something? How do people work with this condition?
I am new to Kubernetes, so feel free to correct me if I am wrong or missing something.
Thanks in advance
Note:
I don't have a problem running a few similar pods on the same node, just don't want all pods to be running on the same node, just because there was only one node available to deploy.
I see you are trying to make sure that k8s will never schedule all pod replicas on the same node.
It's not possible to create hard requrement like this for kubernetes scheduler.
Scheduler will try its best to schedule your application as evenly as possible but in situation when you have 2 nodes without spare resources and 1 node where all pod replicas would be scheduled, k8s can do one of the folowing actions (depending on configuration):
antiaffnity
+ requiredDuringSchedulingIgnoredDuringExecution
)antiaffnity
+ requiredDuringSchedulingIgnoredDuringExecution
+ cluster autoscaler
)priority based preemption
) and reschedule preempted pods if possibleAlso read this article to get better understanding on how scheduler makes its decisions.
You can also use PodDisruptionBudget to tell kubernetes to make sure a specified replicas are always working, remember that although:
A disruption budget does not truly guarantee that the specified number/percentage of pods will always be up.
kubernetes will take it under consideration when making scheduling decisions.