Unable to deploy updated Deployment due to nodeAffinity and podAffinity

11/26/2020

So I have 4 nodes. 1 is System, 1 is Dev, 1 is Qa and 1 is UAT.

My affinity is as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: auth
  namespace: dev
  labels:
    app: auth
    environment: dev
    app-role: api
    tier: backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: auth
  template:
    metadata:
      labels:
        app: auth
        environment: dev
        app-role: api
        tier: backend
      annotations:
        build: _{Tag}_
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - auth
            topologyKey: kubernetes.io/hostname
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: environment
                operator: In
                values:
                - dev
      containers:
        - name: companyauth
          image: company.azurecr.io/auth:_{Tag}_
          imagePullPolicy: Always
          env:
            - name: ConnectionStrings__DevAuth
              value: dev
          ports:
            - containerPort: 80
      imagePullSecrets:
        - name: ips

It is my intention to make sure that on my production cluster, which has 3 nodes in 3 different availability zones. That all the pods will be scheduled on a different node/availability zone. However, it appears that if I already have pods scheduled on a node, then when I do a deployment it will not overwrite the pods that already exist.

0/4 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't match node selector.

However, if I remove the podAffinity, it works fine and will overwrite the current node with the new pod from the deployment. What is the correct way to do this to ensure my deployment on my production cluster will always have a pod scheduled on a different node in a different availability zone and also be able to update the existing nodes?

-- James
azure-aks
continuous-deployment
deployment
kubernetes

2 Answers

11/26/2020

Your node affinity rule mandates that only the Dev node will be considered for scheduling. In combination with your podAntiAffinityRule this means, only one Pod can be scheduled (the one on the Dev node).

To get an even scheduling across nodes, you will have to add additional Dev nodes or remove the nodeAffinity rule.

-- Fritz Duchardt
Source: StackOverflow

12/1/2020

Your goal can be achieved using only PodAntiAffinity.

I have tested this with my GKE test cluster, but it should work similar on Azure.

Current Issue

In your current setup, you have set podAntiAffinity with nodeAffinity.

Pod anti-affinity can prevent the scheduler from locating a new pod on the same node as pods with the same labels if the label selector on the new pod matches the label on the current pod.

In your Deployment setup, new pods will have labels like:

  • app: auth
  • environment: dev
  • app-role: api
  • tier: backend

PodAntiAffinity was configured to not allow deploying new pod, if there is already pod with label: app: auth.

NodeAffinity was configured to deploy only on the node with label environment: dev.

To Sum up, your error:

0/4 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't match node selector.

1 node(s) didn't match pod affinity/anti-affinity

your setup allows to deploy only on the node with label environment: dev and only one pod with label app: auth.

As you mention

if I already have pods scheduled on a node, then when I do a deployment it will not overwrite the pods that already exist.

PodAntiAffinity behavior worked and didn't allow to deploy new pod with label app: auth as there was already one.

3 node(s) didn't match node selector.

NodeAffinity allows to deploy pods only on the node with label environment: dev. Other nodes have probably labels like environment: system, environment: uat, environment: qa which didn't match environment: dev label thus didn't match node selector.

Solution

Easiest way is to remove NodeAffinity.

While TolpologyKey is set to kubernetes.io/hostname in PodAntiAffinity it's enough.

The topologyKey uses the default label attached to a node to dynamically filter on the name of the node.

For more information, please check this article.

If you will describe your nodes and grep them with kubernetes.io/hostname you will get unique value:

$ kubectl describe node | grep kubernetes.io/hostname
                    kubernetes.io/hostname=gke-affinity-default-pool-27d6eabd-vhss
                    kubernetes.io/hostname=gke-affinity-default-pool-5014ecf7-5tkh
                    kubernetes.io/hostname=gke-affinity-default-pool-c2afcc97-clg9

Tests

apiVersion: apps/v1
kind: Deployment
metadata:
  name: auth
  labels:
    app: auth
    environment: dev
    app-role: api
    tier: backend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: auth
  template:
    metadata:
      labels:
        app: auth
        environment: dev
        app-role: api
        tier: backend
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - auth
            topologyKey: kubernetes.io/hostname
      containers:
        - name: nginx
          image: nginx
          imagePullPolicy: Always
          ports:
            - containerPort: 80

After deploying this YAML.

$ kubectl get po -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP         NODE                                      NOMINATED NODE   READINESS GATES
auth-7fccf5f7b8-4dkc4   1/1     Running   0          9s    10.0.1.9   gke-affinity-default-pool-c2afcc97-clg9   <none>           <none>
auth-7fccf5f7b8-5qgt4   1/1     Running   0          8s    10.0.2.6   gke-affinity-default-pool-5014ecf7-5tkh   <none>           <none>
auth-7fccf5f7b8-bdmtw   1/1     Running   0          8s    10.0.0.9   gke-affinity-default-pool-27d6eabd-vhss   <none>           <none>

If you would increase replicas to 7, no more pods will be deployed. All new pods will stuck in Pending state, as antiPodAffinity worked (each node already have pod with label app: dev).

$ kubectl get po -o wide
NAME                    READY   STATUS    RESTARTS   AGE    IP         NODE                                      NOMINATED NODE   READINESS GATES
auth-7fccf5f7b8-4299k   0/1     Pending   0          79s    <none>     <none>                                    <none>           <none>
auth-7fccf5f7b8-4dkc4   1/1     Running   0          2m1s   10.0.1.9   gke-affinity-default-pool-c2afcc97-clg9   <none>           <none>
auth-7fccf5f7b8-556h5   0/1     Pending   0          78s    <none>     <none>                                    <none>           <none>
auth-7fccf5f7b8-5qgt4   1/1     Running   0          2m     10.0.2.6   gke-affinity-default-pool-5014ecf7-5tkh   <none>           <none>
auth-7fccf5f7b8-bdmtw   1/1     Running   0          2m     10.0.0.9   gke-affinity-default-pool-27d6eabd-vhss   <none>           <none>
auth-7fccf5f7b8-q4s2c   0/1     Pending   0          79s    <none>     <none>                                    <none>           <none>
auth-7fccf5f7b8-twb9j   0/1     Pending   0          79s    <none>     <none>                                    <none>           <none>

Similar solution was described in High-Availability Deployment of Pods on Multi-Zone Worker Nodes blog.

-- PjoterS
Source: StackOverflow