Without any knows changes in our Kubernetes 1.6 cluster all new or restarted pods are not scheduled anymore. The error I get is:
No nodes are available that match all of the following predicates:: MatchInterPodAffinity (10), PodToleratesNodeTaints (2).
Our cluster was working perfectly before and I really cannot see any configuration changes that have been made before that occured.
Things I already tried:
Some of the pods do have anti-affinity settings that worked before, but most pods do not have any affinity settings.
Cluster Infos:
Is there any known cause to this?
What are settings and logs I could check that could give more insight?
Is there any possibility to debug the scheduler?
The problem was that a Pod got stuck in deletion. That caused kube-controller-manager to stop working.
Deletion didn't work because the Pod/RS/Deployment in question had limits that conflicted with the maxLimitRequestRatio that we had set after the creation. A bug report is on the way.
The solution was to increase maxLimitRequestRatio and eventually restart kube-controller-manager.