Kubernetes pods without any affinity suddenly stop scheduling because of MatchInterPodAffinity predicate

7/16/2017

Without any knows changes in our Kubernetes 1.6 cluster all new or restarted pods are not scheduled anymore. The error I get is:

No nodes are available that match all of the following predicates:: MatchInterPodAffinity (10), PodToleratesNodeTaints (2).

Our cluster was working perfectly before and I really cannot see any configuration changes that have been made before that occured.

Things I already tried:

  • restarting the master node
  • restarting kube-scheduler
  • deleting affected pods, deployments, stateful sets

Some of the pods do have anti-affinity settings that worked before, but most pods do not have any affinity settings.

Cluster Infos:

  • Kubernetes 1.6.2
  • Kops on AWS
  • 1 master, 8 main-nodes, 1 tainted data processing node

Is there any known cause to this?

What are settings and logs I could check that could give more insight?

Is there any possibility to debug the scheduler?

-- tobi_nk360
amazon-ec2
amazon-web-services
kops
kubernetes

1 Answer

7/16/2017

The problem was that a Pod got stuck in deletion. That caused kube-controller-manager to stop working.

Deletion didn't work because the Pod/RS/Deployment in question had limits that conflicted with the maxLimitRequestRatio that we had set after the creation. A bug report is on the way.

The solution was to increase maxLimitRequestRatio and eventually restart kube-controller-manager.

-- tobi_nk360
Source: StackOverflow