NoExecuteTaintManager falsely deleting Pod?

10/25/2019

I am receiving NoExecuteTaintManager events that are deleting my pod but I can't figure out why. The node is healthy and the Pod has the appropriate tolerations.

This is actually causing infinite scale up because my Pod is setup so that it uses 3/4 Node CPUs and has a Toleration Grace Period > 0. This forces a new node when a Pod terminates. Cluster Autoscaler tries to keep the replicas == 2.

How do I figure out which taint is causing it specifically? Any then why it thinks that node had that taint? Currently the pods are being killed at exactly 600 seconds (which I have changed tolerationSeconds to be for node.kubernetes.io/unreachable and node.kubernetes.io/not-ready) however the node does not appear to undergo either of those situations.

NAME                                     READY   STATUS        RESTARTS   AGE
my-api-67df7bd54c-dthbn                  1/1     Running       0          8d
my-api-67df7bd54c-mh564                  1/1     Running       0          8d
my-pod-6d7b698b5f-28rgw                  1/1     Terminating   0          15m
my-pod-6d7b698b5f-2wmmg                  1/1     Terminating   0          13m
my-pod-6d7b698b5f-4lmmg                  1/1     Running       0          4m32s
my-pod-6d7b698b5f-7m4gh                  1/1     Terminating   0          71m
my-pod-6d7b698b5f-8b47r                  1/1     Terminating   0          27m
my-pod-6d7b698b5f-bb58b                  1/1     Running       0          2m29s
my-pod-6d7b698b5f-dn26n                  1/1     Terminating   0          25m
my-pod-6d7b698b5f-jrnkg                  1/1     Terminating   0          38m
my-pod-6d7b698b5f-sswps                  1/1     Terminating   0          36m
my-pod-6d7b698b5f-vhqnf                  1/1     Terminating   0          59m
my-pod-6d7b698b5f-wkrtg                  1/1     Terminating   0          50m
my-pod-6d7b698b5f-z6p2c                  1/1     Terminating   0          47m
my-pod-6d7b698b5f-zplp6                  1/1     Terminating   0          62m
14:22:43.678937 8 taint_manager.go:102] NoExecuteTaintManager is deleting Pod: my-pod-6d7b698b5f-dn26n
14:22:43.679073 8 event.go:221] Event(v1.ObjectReference{Kind:"Pod", Namespace:"prod", Name:"my-pod-6d7b698b5f-dn26n", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TaintManagerEviction' Marking for deletion Pod prod/my-pod-6d7b698b5f-dn26n
# kubectl -n prod get pod my-pod-6d7b698b5f-8b47r -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum/config: bcdc41c616f736849a6bef9c726eec9bf704ce7d2c61736005a6fedda0ee14d0
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2019-10-25T14:09:17Z"
  deletionGracePeriodSeconds: 172800
  deletionTimestamp: "2019-10-27T14:20:40Z"
  generateName: my-pod-6d7b698b5f-
  labels:
    app.kubernetes.io/instance: my-pod
    app.kubernetes.io/name: my-pod
    pod-template-hash: 6d7b698b5f
  name: my-pod-6d7b698b5f-8b47r
  namespace: prod
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: my-pod-6d7b698b5f
    uid: c6360643-f6a6-11e9-9459-12ff96456b32
  resourceVersion: "2408256"
  selfLink: /api/v1/namespaces/prod/pods/my-pod-6d7b698b5f-8b47r
  uid: 08197175-f731-11e9-9459-12ff96456b32
spec:
  containers:
  - args:
    - -c
    - from time import sleep; sleep(10000)
    command:
    - python
    envFrom:
    - secretRef:
        name: pix4d
    - secretRef:
        name: rabbitmq
    image: python:3.7-buster
    imagePullPolicy: Always
    name: my-pod
    ports:
    - containerPort: 5000
      name: http
      protocol: TCP
    resources:
      requests:
        cpu: "3"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-gv6q5
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-10-142-54-235.ec2.internal
  nodeSelector:
    nodepool: zeroscaling-gpu-accelerated-p2-xlarge
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 172800
  tolerations:
  - key: specialized
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 600
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 600
  volumes:
  - name: default-token-gv6q5
    secret:
      defaultMode: 420
      secretName: default-token-gv6q5
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-10-25T14:10:40Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-10-25T14:11:09Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-10-25T14:11:09Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-10-25T14:10:40Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://15e2e658c459a91a86573c1096931fa4ac345e06f26652da2a58dc3e3b3d5aa2
    image: python:3.7-buster
    imageID: docker-pullable://python@sha256:f0db6711abee8d406121c9e057bc0f7605336e8148006164fea2c43809fe7977
    lastState: {}
    name: my-pod
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2019-10-25T14:11:09Z"
  hostIP: 10.142.54.235
  phase: Running
  podIP: 10.142.63.233
  qosClass: Burstable
  startTime: "2019-10-25T14:10:40Z"
# kubectl -n prod describe pod my-pod-6d7b698b5f-8b47r   
Name:                      my-pod-6d7b698b5f-8b47r
Namespace:                 prod
Priority:                  0
PriorityClassName:         <none>
Node:                      ip-10-142-54-235.ec2.internal/10.142.54.235
Start Time:                Fri, 25 Oct 2019 10:10:40 -0400
Labels:                    app.kubernetes.io/instance=my-pod
                           app.kubernetes.io/name=my-pod
                           pod-template-hash=6d7b698b5f
Annotations:               checksum/config: bcdc41c616f736849a6bef9c726eec9bf704ce7d2c61736005a6fedda0ee14d0
                           kubernetes.io/psp: eks.privileged
Status:                    Terminating (lasts 47h)
Termination Grace Period:  172800s
IP:                        10.142.63.233
Controlled By:             ReplicaSet/my-pod-6d7b698b5f
Containers:
  my-pod:
    Container ID:  docker://15e2e658c459a91a86573c1096931fa4ac345e06f26652da2a58dc3e3b3d5aa2
    Image:         python:3.7-buster
    Image ID:      docker-pullable://python@sha256:f0db6711abee8d406121c9e057bc0f7605336e8148006164fea2c43809fe7977
    Port:          5000/TCP
    Host Port:     0/TCP
    Command:
      python
    Args:
      -c
      from time import sleep; sleep(10000)
    State:          Running
      Started:      Fri, 25 Oct 2019 10:11:09 -0400
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:  3
    Environment Variables from:
      pix4d       Secret  Optional: false
      rabbitmq    Secret  Optional: false
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gv6q5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-gv6q5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-gv6q5
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  nodepool=zeroscaling-gpu-accelerated-p2-xlarge
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                 node.kubernetes.io/unreachable:NoExecute for 600s
                 specialized
Events:
  Type     Reason            Age                From                                    Message
  ----     ------            ----               ----                                    -------
  Warning  FailedScheduling  12m (x2 over 12m)  default-scheduler                       0/13 nodes are available: 1 Insufficient pods, 13 Insufficient cpu, 6 node(s) didn't match node selector.
  Normal   TriggeredScaleUp  12m                cluster-autoscaler                      pod triggered scale-up: [{prod-worker-gpu-accelerated-p2-xlarge 7->8 (max: 13)}]
  Warning  FailedScheduling  11m (x5 over 11m)  default-scheduler                       0/14 nodes are available: 1 Insufficient pods, 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 6 node(s) didn't match node selector.
  Normal   Scheduled         11m                default-scheduler                       Successfully assigned prod/my-pod-6d7b698b5f-8b47r to ip-10-142-54-235.ec2.internal
  Normal   Pulling           11m                kubelet, ip-10-142-54-235.ec2.internal  pulling image "python:3.7-buster"
  Normal   Pulled            10m                kubelet, ip-10-142-54-235.ec2.internal  Successfully pulled image "python:3.7-buster"
  Normal   Created           10m                kubelet, ip-10-142-54-235.ec2.internal  Created container
  Normal   Started           10m                kubelet, ip-10-142-54-235.ec2.internal  Started container
# kubectl -n prod describe node ip-10-142-54-235.ec2.internal
Name:               ip-10-142-54-235.ec2.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=p2.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1b
                    kubernetes.io/hostname=ip-10-142-54-235.ec2.internal
                    nodepool=zeroscaling-gpu-accelerated-p2-xlarge
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 25 Oct 2019 10:10:20 -0400
Taints:             specialized=true:NoExecute
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 25 Oct 2019 10:23:11 -0400   Fri, 25 Oct 2019 10:10:19 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 25 Oct 2019 10:23:11 -0400   Fri, 25 Oct 2019 10:10:19 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 25 Oct 2019 10:23:11 -0400   Fri, 25 Oct 2019 10:10:19 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 25 Oct 2019 10:23:11 -0400   Fri, 25 Oct 2019 10:10:40 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.142.54.235
  ExternalIP:   3.86.112.24
  Hostname:     ip-10-142-54-235.ec2.internal
  InternalDNS:  ip-10-142-54-235.ec2.internal
  ExternalDNS:  ec2-3-86-112-24.compute-1.amazonaws.com
Capacity:
 attachable-volumes-aws-ebs:  39
 cpu:                         4
 ephemeral-storage:           209702892Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      62872868Ki
 pods:                        58
Allocatable:
 attachable-volumes-aws-ebs:  39
 cpu:                         4
 ephemeral-storage:           200777747706
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      61209892Ki
 pods:                        58
System Info:
 Machine ID:                 0e76fec3e06d41a6bf2c49a18fbe1795
 System UUID:                EC29973A-D616-F673-6899-A96C97D5AE2D
 Boot ID:                    4bc510b6-f615-48a7-9e1e-47261ddf26a4
 Kernel Version:             4.14.146-119.123.amzn2.x86_64
 OS Image:                   Amazon Linux 2
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://18.6.1
 Kubelet Version:            v1.13.11-eks-5876d6
 Kube-Proxy Version:         v1.13.11-eks-5876d6
ProviderID:                  aws:///us-east-1b/i-0f5b519aa6e38e04a
Non-terminated Pods:         (5 in total)
  Namespace                  Name                          CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                          ------------  ----------  ---------------  -------------  ---
  amazon-cloudwatch          cloudwatch-agent-4d24j        50m (1%)      250m (6%)   50Mi (0%)        250Mi (0%)     12m
  amazon-cloudwatch          fluentd-cloudwatch-wkslq      50m (1%)      0 (0%)      150Mi (0%)       300Mi (0%)     12m
  prod                       my-pod-6d7b698b5f-8b47r       3 (75%)       0 (0%)      0 (0%)           0 (0%)         14m
  kube-system                aws-node-6nr6g                10m (0%)      0 (0%)      0 (0%)           0 (0%)         13m
  kube-system                kube-proxy-wf8k4              100m (2%)     0 (0%)      0 (0%)           0 (0%)         13m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests     Limits
  --------                    --------     ------
  cpu                         3210m (80%)  250m (6%)
  memory                      200Mi (0%)   550Mi (0%)
  ephemeral-storage           0 (0%)       0 (0%)
  attachable-volumes-aws-ebs  0            0
Events:
  Type    Reason                   Age                From                                       Message
  ----    ------                   ----               ----                                       -------
  Normal  Starting                 13m                kubelet, ip-10-142-54-235.ec2.internal     Starting kubelet.
  Normal  NodeHasSufficientMemory  13m (x2 over 13m)  kubelet, ip-10-142-54-235.ec2.internal     Node ip-10-142-54-235.ec2.internal status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    13m (x2 over 13m)  kubelet, ip-10-142-54-235.ec2.internal     Node ip-10-142-54-235.ec2.internal status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     13m (x2 over 13m)  kubelet, ip-10-142-54-235.ec2.internal     Node ip-10-142-54-235.ec2.internal status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  13m                kubelet, ip-10-142-54-235.ec2.internal     Updated Node Allocatable limit across pods
  Normal  Starting                 12m                kube-proxy, ip-10-142-54-235.ec2.internal  Starting kube-proxy.
  Normal  NodeReady                12m                kubelet, ip-10-142-54-235.ec2.internal     Node ip-10-142-54-235.ec2.internal status is now: NodeReady
# kubectl get node ip-10-142-54-235.ec2.internal -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2019-10-25T14:10:20Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: p2.xlarge
    beta.kubernetes.io/os: linux
    failure-domain.beta.kubernetes.io/region: us-east-1
    failure-domain.beta.kubernetes.io/zone: us-east-1b
    kubernetes.io/hostname: ip-10-142-54-235.ec2.internal
    nodepool: zeroscaling-gpu-accelerated-p2-xlarge
  name: ip-10-142-54-235.ec2.internal
  resourceVersion: "2409195"
  selfLink: /api/v1/nodes/ip-10-142-54-235.ec2.internal
  uid: 2d934979-f731-11e9-89b8-0234143df588
spec:
  providerID: aws:///us-east-1b/i-0f5b519aa6e38e04a
  taints:
  - effect: NoExecute
    key: specialized
    value: "true"
status:
  addresses:
  - address: 10.142.54.235
    type: InternalIP
  - address: 3.86.112.24
    type: ExternalIP
  - address: ip-10-142-54-235.ec2.internal
    type: Hostname
  - address: ip-10-142-54-235.ec2.internal
    type: InternalDNS
  - address: ec2-3-86-112-24.compute-1.amazonaws.com
    type: ExternalDNS
  allocatable:
    attachable-volumes-aws-ebs: "39"
    cpu: "4"
    ephemeral-storage: "200777747706"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 61209892Ki
    pods: "58"
  capacity:
    attachable-volumes-aws-ebs: "39"
    cpu: "4"
    ephemeral-storage: 209702892Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 62872868Ki
    pods: "58"
  conditions:
  - lastHeartbeatTime: "2019-10-25T14:23:51Z"
    lastTransitionTime: "2019-10-25T14:10:19Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2019-10-25T14:23:51Z"
    lastTransitionTime: "2019-10-25T14:10:19Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2019-10-25T14:23:51Z"
    lastTransitionTime: "2019-10-25T14:10:19Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2019-10-25T14:23:51Z"
    lastTransitionTime: "2019-10-25T14:10:40Z"
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - python@sha256:f0db6711abee8d406121c9e057bc0f7605336e8148006164fea2c43809fe7977
    - python:3.7-buster
    sizeBytes: 917672801
  - names:
    - 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni@sha256:5b7e7435f88a86bbbdb2a5ecd61e893dc14dd13c9511dc8ace362d299259700a
    - 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.5.4
    sizeBytes: 290739356
  - names:
    - fluent/fluentd-kubernetes-daemonset@sha256:582770d951f81e0971e852089239ced0186e0bdc3226daf16b99ca4cc22de4f7
    - fluent/fluentd-kubernetes-daemonset:v1.3.3-debian-cloudwatch-1.4
    sizeBytes: 261867521
  - names:
    - amazon/cloudwatch-agent@sha256:877106acbc56e747ebe373548c88cd37274f666ca11b5c782211db4c5c7fb64b
    - amazon/cloudwatch-agent:latest
    sizeBytes: 131360039
  - names:
    - 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy@sha256:4767b441ddc424b0ea63c305b79be154f65fb15ebefe8a3b2832ce55aa6de2f0
    - 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.13.8
    sizeBytes: 80183964
  - names:
    - busybox@sha256:fe301db49df08c384001ed752dff6d52b4305a73a7f608f21528048e8a08b51e
    - busybox:latest
    sizeBytes: 1219782
  - names:
    - 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause-amd64@sha256:bea77c323c47f7b573355516acf927691182d1333333d1f41b7544012fab7adf
    - 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause-amd64:3.1
    sizeBytes: 742472
  nodeInfo:
    architecture: amd64
    bootID: 4bc510b6-f615-48a7-9e1e-47261ddf26a4
    containerRuntimeVersion: docker://18.6.1
    kernelVersion: 4.14.146-119.123.amzn2.x86_64
    kubeProxyVersion: v1.13.11-eks-5876d6
    kubeletVersion: v1.13.11-eks-5876d6
    machineID: 0e76fec3e06d41a6bf2c49a18fbe1795
    operatingSystem: linux
    osImage: Amazon Linux 2
    systemUUID: EC29973A-D616-F673-6899-A96C97D5AE2D
-- Ryan
kubernetes

1 Answer

11/5/2019

Unfortunately, I don't have an exact answer to your issue, but I may have some workaround.

I think I had the same issue with Amazon EKS cluster, version 1.13.11 - my pod was triggering node scale-up, pod was scheduled, works for 300s and then evicted:

74m         Normal    TaintManagerEviction   pod/master-3bb760a7-b782-4138-b09f-0ca385db9ad7-workspace   Marking for deletion Pod project-beta/master-3bb760a7-b782-4138-b09f-0ca385db9ad7-workspace

Interesting, that the same pod was able to run with no problem if it was scheduled on existing node and not a just created one.

From my investigation, it really looks like some issue with this specific Kubernetes version. Maybe some edge case of the TaintBasedEvictions feature(I think it was enabled by default in 1.13 version of Kubernetes).

To "fix" this issue I updated cluster version to 1.14. After that, mysterious pod eviction did not happen anymore.

So, if it's possible to you, I suggest updating your cluster to 1.14 version(together with cluster-autoscaler).

-- Rybue
Source: StackOverflow