Let's say I have 3 nodes in my cluster and I want to run 300 jobs.
If I run 1 job per POD
and 100 pods per NODE
, what will happen if a node fails in Azure Kubernetes Service?
if a node fails
Cluster Autoscaler (CA) can be used to handle node failures in Azure using autoscaling groups:
Those Jobs will go to pending, as Kubernetes supports 110 pods per node, so wouldn't have the resources to support the failed over jobs. You could look at using the Cluster Autoscaler (Beta) and it would provision more host to satisfy running those jobs that are in a pending state.