K8S Namespace tolerations whitelist conflict

1/17/2022
  1. I have been trying to make use of Azure Spot instances on Azure Kubernetes Service (AKS) - version 1.19.11 and to enable scheduling of pods onto those nodes, I am trying to use the PodTolerationRestriction admission controller.
  2. I can confirm that the PodTolerationRestriction controller is enabled as I am having no issues deploying a replicaset to the default namespace. This is another namespace but we are not specifically adding any tolerations while creating it.
  3. I gathered from elsewhere that, along with whitelisting against a specific taint (in my case spot), it is also necessary to whitelist certain default tolerations. As a result, I have added certain annotations to my namespace.
  4. I do not have any additional tolerations pre-defined for this statefulset.
  5. The node has taints - the first two are taken care of through the helm chart values
    • RabbitMQ=true:NoSchedule
    • Allow=true:NoExecute
    • kubernetes.azure.com/scalesetpriority=spot:NoSchedule

I am wondering what additional tolerations need to be whitelisted.

The annotations I added -

scheduler.alpha.kubernetes.io/defaultTolerations: '[{"operator": "Equal", "value": "spot", "key": "kubernetes.azure.com/scalesetpriority"}]'
scheduler.alpha.kubernetes.io/tolerationsWhitelist: '[{"operator": "Equal", "value": "spot", "key": "kubernetes.azure.com/scalesetpriority"}, {"operator": "Exists", "effect": "NoSchedule", "key": "node.kubernetes.io/memory-pressure"}, {"operator": "Exists", "tolerationSeconds": 300, "effect": "NoExecute", "key": "node.kubernetes.io/unreachable"}, {"operator": "Exists", "tolerationSeconds": 300, "effect": "NoExecute", "key": "node.kubernetes.io/not-ready"}]'

Statefulset describe -

Name:               <release name>
Namespace:          <namespace>
CreationTimestamp:  Tue, 18 Jan 2022 19:37:38 +0530
Selector:           app.kubernetes.io/instance=<name>,app.kubernetes.io/name=rabbitmq
Labels:             app.kubernetes.io/instance=rabbit
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=rabbitmq
                    helm.sh/chart=rabbitmq-8.6.1
Annotations:        meta.helm.sh/release-name: <release name>
                    meta.helm.sh/release-namespace: <namespace>
Replicas:           3 desired | 0 total
Update Strategy:    RollingUpdate
Pods Status:        0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/instance=rabbit
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=rabbitmq
                    helm.sh/chart=rabbitmq-8.6.1
  Annotations:      checksum/config: 1a138ded5a3ade049cbee9f4f8e2d0fd7253c126d49b790495a492601fd9f280
                    checksum/secret: 05af38634eb4b46c2f8db5770013e1368e78b0d5af057aed5fa4fe7eec4c92de
                    prometheus.io/port: 9419
                    prometheus.io/scrape: true
  Service Account:  sa-rabbitmq
  Containers:
   rabbitmq:
    Image:       docker.io/bitnami/rabbitmq:3.8.9-debian-10-r64
    Ports:       5672/TCP, 25672/TCP, 15672/TCP, 4369/TCP, 9419/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Liveness:    exec [/bin/bash -ec rabbitmq-diagnostics -q ping] delay=120s timeout=200s period=30s #success=1 #failure=6
    Readiness:   exec [/bin/bash -ec rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics -q check_local_alarms] delay=10s timeout=200s period=30s #success=1 #failure=3
    Environment:
      <multiple environment variables>
    Mounts:
      /bitnami/rabbitmq/conf from configuration (rw)
      /bitnami/rabbitmq/mnesia from data (rw)
  Volumes:
   configuration:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbit-rabbitmq-config
    Optional:  false
   data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
Volume Claims:  <none>
Events:
  Type     Reason        Age                 From                    Message
  ----     ------        ----                ----                    -------
  Warning  FailedCreate  31s (x14 over 72s)  statefulset-controller  create Pod <pod-name> in StatefulSet <release name> failed error: pod tolerations (possibly merged with namespace default tolerations) conflict with its namespace whitelist
-- Chethan S.
azure-aks
kubernetes
kubernetes-pod
kubernetes-statefulset

0 Answers