Kubernetes podAntiAffinity affects deployment - FailedScheduling - didn't match pod affinity/anti-affinity

11/29/2020

I am running cluster of 3 nodes, created using kops, on AWS EC2 nodes. When I run the deployment kubectl set image deployment.v1.apps/app-web app-web=11122333.dkr.ecr.eu-west-1.amazonaws.com/app:$build_number

I get the newly created pod stuck in Pending whilst existing 3 stay in Running mode indefinitely.

kubectl describe pod app-web-7b44bb94f6-ftbfg gives:

Warning  FailedScheduling  81s (x11 over 8m25s)  default-scheduler  0/6 nodes are available: 3 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't satisfy existing pods anti-affinity rules, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.

kubectl describe pod app-web-7b44bb94f6-ftbfg gives

app-web-7b44bb94f6-ftbfg                            0/1     Pending   0          9m15s
app-web-c977b7ff9-6ngr2                             1/1     Running   0          12h
app-web-c977b7ff9-9lj9c                             1/1     Running   0          12h
app-web-c977b7ff9-klrnv                             1/1     Running   0          12h

I've recently upgraded my cluster from:

# kubectl version --short
Client Version: v1.17.3
Server Version: v1.11.9

to

Client Version: v1.17.3
Server Version: v1.18.10

I've had to upgrade my deployment as well.

apiVersion: apps/v1 # Previously: extensions/v1beta1
kind: Deployment
metadata:
  name: app-web
spec:
  replicas: 3
  selector:  # This selector had to be added once migrated to apps/v1
    matchLabels:
      app: app-web
      role: web
  template:
    metadata:
      labels:
        app: app-web
        role: web
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - app-web
              topologyKey: "kubernetes.io/hostname"

The problem seem to be related to: podAntiAffinity, however it wasn't a problem in a previous version of a Kubernetes. Shall I switch to preferredDuringSchedulingIgnoredDuringExecution instead or perhaps there is another solution to this problem?

-- NeverEndingQueue
deployment
kops
kubernetes

1 Answer

11/29/2020

The issue is that the default maxUnavailable for RollingUpdate strategy is 25%, which doesn't allow Kubernetes to evict 1 out of 3 pods. I've added this to deployment strategy spec:

  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1

and things are back to normal.

I haven't analyzed whether that default 25% has changed, since this issue wasn't affecting me in the previous version of the Kubernetes.

-- NeverEndingQueue
Source: StackOverflow