nodeAffinity & nodeAntiAffinity are ignored

2/12/2021

I am having a problem where I am trying to restrict a deployment to <strike>work on</strike> avoid a specific node pool and nodeAffinity and nodeAntiAffinity don't seem to be working.

  • We are running DOKS (Digital Ocean Managed Kubernetes) v1.19.3
  • We have two node pools: infra and clients, with nodes on both labelled as such
  • In this case, we would like to avoid deploying to the nodes labelled "infra"

For whatever reason, it seems like no matter what configuration I use, Kubernetes seems to schedule randomly across both node pools.

See configuration below, and the results of scheduling

deployment.yaml snippet

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
  namespace: "test"
  labels:
    app: wordpress
    client: "test"
    product: hosted-wordpress
    version: v1
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      app: wordpress
      client: "test"
  template:
    metadata:
      labels:
        app: wordpress
        client: "test"
        product: hosted-wordpress
        version: v1
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: doks.digitalocean.com/node-pool
                  operator: NotIn
                  values:
                  - infra

node description snippet note the label, 'doks.digitalocean.com/node-pool=infra'

kubectl describe node infra-3dmga

Name:               infra-3dmga
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=s-2vcpu-4gb
                    beta.kubernetes.io/os=linux
                    doks.digitalocean.com/node-id=67d84a52-8d08-4b19-87fe-1d837ba46eb6
                    doks.digitalocean.com/node-pool=infra
                    doks.digitalocean.com/node-pool-id=2e0f2a1d-fbfa-47e9-9136-c897e51c014a
                    doks.digitalocean.com/version=1.19.3-do.2
                    failure-domain.beta.kubernetes.io/region=tor1
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=infra-3dmga
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=s-2vcpu-4gb
                    region=tor1
                    topology.kubernetes.io/region=tor1
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.137.0.230
                    csi.volume.kubernetes.io/nodeid: {"dobs.csi.digitalocean.com":"222551559"}
                    io.cilium.network.ipv4-cilium-host: 10.244.0.139
                    io.cilium.network.ipv4-health-ip: 10.244.0.209
                    io.cilium.network.ipv4-pod-cidr: 10.244.0.128/25
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 20 Dec 2020 20:17:20 -0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  infra-3dmga
  AcquireTime:     <unset>
  RenewTime:       Fri, 12 Feb 2021 08:04:09 -0800

sometimes results in

kubectl get po -n test -o wide

NAME                         READY   STATUS    RESTARTS   AGE   IP             NODE          NOMINATED NODE   READINESS GATES
wordpress-5bfcb6f44b-2j7kv   5/5     Running   0          1h   10.244.0.107   infra-3dmga   <none>           <none>

other times results in

kubectl get po -n test -o wide

NAME                         READY   STATUS    RESTARTS   AGE   IP             NODE          NOMINATED NODE   READINESS GATES
wordpress-5bfcb6f44b-b42wj   5/5     Running   0          5m   10.244.0.107   clients-3dmem   <none>           <none>

I have tried using nodeAntiAffinity to similar effect.

And lastly, I have even tried creating test labels instead of using the built-in labels from Digital Ocean and I get the same affect (Affinity just doesn't seem to be working for me at all).

I am hoping that someone can help me resolve or even point out a silly mistake in my config, because this issue has been driving me nuts trying to solve it (and it also is a useful feature, when it works).

Thank you,

-- Joel
kubernetes
kubernetes-deployment

2 Answers

2/12/2021

In the deployment file, you have mentioned operator: NotIn which working as anti-affinity.

Please use operator: In to achieve node affinity. So for instance, if we want pods to use node which has clients labels.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
  namespace: "test"
  labels:
    app: wordpress
    client: "test"
    product: hosted-wordpress
    version: v1
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      app: wordpress
      client: "test"
  template:
    metadata:
      labels:
        app: wordpress
        client: "test"
        product: hosted-wordpress
        version: v1
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: "doks.digitalocean.com/node-pool"
                  operator: In
                  values: ["clients"] ##Pls use correct label
-- Manjul
Source: StackOverflow

2/18/2021

Great news!

I have finally resolved this issue.

The problem was "user error" of course.

There was an extra Spec line further down in the config that was very hidden.

Originally, before switching to StatefulSets, we were using Deployments, and I had a pod Spec hostname entry which was overriding the Spec at the top of the file.

Thanks @WytrzymaƂyWiktor and @Manjul for the suggestions!

-- Joel
Source: StackOverflow