I am using a Kubernetes cluster with 2 workers. I have approximately 100 deployments. Each of them has 2 or 4 replicas (so I have approximately 300 pods per worker, yeah it's a lot of pods).
My problems are: When a worker is down, Kubernetes is trying to redeploy every failing pod on the remaining alive node. So at the end of the operation I have: - the remaining alive worker node with 600 pods - master nodes load average is lava because they are rescheduling 300 pods - when the failing worker node is back alive, he is empty because every pods are on the other worker node.
The only solution I found: Making 2 deployments for every applications (one per worker) to prevent the rescheduling of 300 pods.
Are there better solutions please ?
Yes, one of the ways you can approach this for a 2 pod deployments is tu use Pod Anti-Affinity to say that pods from given deployment can not coexist together on the same server, which would result in at most 1 pod of deployment started per server and the rest in Pending state until new nodes become available.