cluster-autoscaler deployment fails with "1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector"

1/6/2022

I have created a k8s cluster with kops (1.21.4) on AWS and as per the docs on autoscaler. I have done the required changes to my cluster but when the cluster starts, the cluster-autoscaler pod is unable to schedule on any node. When I describe the pod, I see the following:

Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  4m31s (x92 over 98m)  default-scheduler  0/4 nodes are available: 1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector.

Looking at the deployment for cluster I see the following podAntiAffinity:

      affinity:                                                                 
        podAntiAffinity:                                                        
          preferredDuringSchedulingIgnoredDuringExecution:                      
          - podAffinityTerm:                                                    
              labelSelector:                                                    
                matchExpressions:                                               
                - key: app                                                      
                  operator: In                                                  
                  values:                                                       
                  - cluster-autoscaler                                          
              topologyKey: topology.kubernetes.io/zone                          
            weight: 100                                                         
          requiredDuringSchedulingIgnoredDuringExecution:                       
          - labelSelector:                                                      
              matchExpressions:                                                 
              - key: app                                                        
                operator: In                                                    
                values:                                                         
                - cluster-autoscaler                                            
            topologyKey: kubernetes.com/hostname

From this I understand that it want to prevent running pod on same node which already has cluster-autoscaler running. But that doesn't seem to justify the error seen in the pod status.

Edit: The pod for autoscaler has the following nodeSelectors and tolerations:

Node-Selectors:              node-role.kubernetes.io/master=
Tolerations:                 node-role.kubernetes.io/master op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

So clearly, it should be able to schedule on master node too.

I am not sure what else do I need to do to make the pod up and running.

-- Divick
aws-auto-scaling
kops
kubernetes

2 Answers

1/6/2022

You need to check the pod/deployment for nodeSelector property. Make sure that your desired nodes have this label.

Also, if you want to schedule pods on the master node, you must remove the taint first

kubectl taint nodes --all node-role.kubernetes.io/master-
-- Rakesh Gupta
Source: StackOverflow

1/10/2022

Posting the answer out of comments.


There are podAffinity rules in place so first thing to check is if any errors in scheduling are presented. Which is the case:

0/4 nodes are available: 1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector.

Since there are 1 control plane (on which pod is supposed to be scheduled) and 3 worked nodes, that leads to the error 1 Too many pods related to the control plane.


Since cluster is running in AWS, there's a known limitation about amount of network interfaces and private IP addresses per machine type - IP addresses per network interface per instance type.

t3.small was used which has 3 interfaces and 4 IPs per interface = 12 in total which was not enough.

Scaling up to t3.medium resolved the issue.


Credits to Jonas's answer about the root cause.

-- moonkotte
Source: StackOverflow