I can see this Message: Node is shutting, evicting pods in pod description, This happening only for pods with a specific toleration, node selector on a preemptible node pool.
we have added tolerations to pod and created different node pools with different taints(preemptible,non preemptible) to segregate preemptible and non preemptible pods on a cluster.
cluster without taints is working fine.
cluster with taints has an issue that pods are getting stuck in shutdown status(only pods which got deployed on preemptible nodepool)
Here is the pod description
Namespace: XXXXXX
Priority: 0
Node: gke-cluster-reliable-preemptible-node-XXXXXX
Start Time: Tue, 10 Aug 2021 16:44:30 +0530
Labels: app=XXXX
pod-template-hash=XXXX
release=XXXX
repo=XXX
Annotations: randVersion: a200a
Status: Failed
Reason: Shutdown
Message: Node is shutting, evicting pods
IP:
IPs: <none>
Controlled By: ReplicaSet/career-assessor-be-8467d6c885
Containers:
career-assessor-be:
Image: XXXXXX
Port: 8001/TCP
key: CLOUD_SQL_CONNECTION_NAME
Host Port: 0/TCP
Command:
/bin/sh
-c
Args:
XXXXX
Limits:
cpu: 3200m
memory: 2400Mi
Requests:
cpu: 1600m
memory: 1800Mi
Environment Variables from:
careerassessor-config ConfigMap Optional: false
Environment:
LOG_TO_CONSOLE: 1
INACTIVITY_PERIOD:
USER_EMAIL: jyostna@springboard.com
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xpdwd (ro)
cloudsql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.17
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
-instances=$(CLOUD_SQL_CONNECTION_NAME)=tcp:0.0.0.0:3306
-credential_file=/secrets/cloudsql/cloudsql-instance-credentials.json
-term_timeout=$(CLOUD_SQL_CONNECTION_TIMEOUT)s
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 20m
memory: 20Mi
Environment:
CLOUD_SQL_CONNECTION_NAME: <set to the key 'CLOUD_SQL_CONNECTION_NAME' of config map 'careerassessor-config'> Optional: false
CLOUD_SQL_CONNECTION_TIMEOUT: <set to the key 'CLOUD_SQL_CONNECTION_TIMEOUT' of config map 'careerassessor-config'> Optional: false
Mounts:
/secrets/cloudsql from careerassessor-cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xpdwd (ro)
Volumes:
careerassessor-cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: XXXXX
Optional: false
default-token-xpdwd:
Type: Secret (a volume populated by a Secret)
SecretName: XXX
Optional: false
QoS Class: Burstable
Node-Selectors: non-preemptible=false
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
non-preemptible=false:NoSchedule
Events: <none>
Here is the yaml of pod
apiVersion: v1
kind: Pod
metadata:
annotations:
randVersion: a200a
creationTimestamp: "2021-08-10T10:59:29Z"
generateName: xxx
labels:
app: xxx
pod-template-hash: 8467d6c885
release: xxxx
repo: xxx
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:randVersion: {}
f:generateName: {}
f:labels:
.: {}
f:app: {}
f:pod-template-hash: {}
f:release: {}
f:repo: {}
f:ownerReferences:
.: {}
k:{"uid":"674b9e8e-420e-44e7-9601-871be01a9fcb"}:
.: {}
f:apiVersion: {}
f:blockOwnerDeletion: {}
f:controller: {}
f:kind: {}
f:name: {}
f:uid: {}
f:spec:
f:containers:
k:{"name":"career-assessor-be"}:
.: {}
f:args: {}
f:command: {}
f:env:
.: {}
k:{"name":"INACTIVITY_PERIOD"}:
.: {}
f:name: {}
k:{"name":"LOG_TO_CONSOLE"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"USER_EMAIL"}:
.: {}
f:name: {}
f:value: {}
f:envFrom: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":8001,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:name: {}
f:protocol: {}
f:resources:
.: {}
f:limits:
.: {}
f:cpu: {}
f:memory: {}
f:requests:
.: {}
f:cpu: {}
f:memory: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
k:{"name":"cloudsql-proxy"}:
.: {}
f:command: {}
f:env:
.: {}
k:{"name":"CLOUD_SQL_CONNECTION_NAME"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:configMapKeyRef:
.: {}
f:key: {}
f:name: {}
k:{"name":"CLOUD_SQL_CONNECTION_TIMEOUT"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:configMapKeyRef:
.: {}
f:key: {}
f:name: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources:
.: {}
f:limits:
.: {}
f:cpu: {}
f:memory: {}
f:requests:
.: {}
f:cpu: {}
f:memory: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/secrets/cloudsql"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:nodeSelector:
.: {}
f:non-preemptible: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:terminationGracePeriodSeconds: {}
f:tolerations: {}
f:volumes:
.: {}
k:{"name":"careerassessor-cloudsql-instance-credentials"}:
.: {}
f:name: {}
f:secret:
.: {}
f:defaultMode: {}
f:secretName: {}
manager: kube-controller-manager
operation: Update
time: "2021-08-10T10:59:29Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
k:{"type":"PodScheduled"}:
f:message: {}
f:reason: {}
manager: kube-scheduler
operation: Update
time: "2021-08-10T10:59:29Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:message: {}
f:phase: {}
f:reason: {}
f:startTime: {}
manager: kubelet
operation: Update
time: "2021-08-10T11:51:28Z"
name: career-assessor-be-8467d6c885-h27sh
namespace: jyostna1
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: career-assessor-be-8467d6c885
uid: 674b9e8e-420e-44e7-9601-871be01a9fcb
resourceVersion: "48899168"
uid: 8837f88d-7e3e-444f-a804-32a7a6e98c71
spec:
containers:
- args:
- |
xxxx
command:
- /bin/sh
- -c
env:
- name: LOG_TO_CONSOLE
value: "1"
- name: INACTIVITY_PERIOD
- name: USER_EMAIL
value: jyostna@springboard.com
envFrom:
- configMapRef:
name: careerassessor-config
image: us.gcr.io/springboard-production/career_assessor:IP-405-implement-explored-strategy-for-r
imagePullPolicy: Always
name: career-assessor-be
ports:
- containerPort: 8001
name: be-port
protocol: TCP
resources:
limits:
cpu: 3200m
memory: 2400Mi
requests:
cpu: 1600m
memory: 1800Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-xpdwd
readOnly: true
- command:
- /cloud_sql_proxy
- -instances=$(CLOUD_SQL_CONNECTION_NAME)=tcp:0.0.0.0:3306
- -credential_file=/secrets/cloudsql/cloudsql-instance-credentials.json
- -term_timeout=$(CLOUD_SQL_CONNECTION_TIMEOUT)s
env:
- name: CLOUD_SQL_CONNECTION_NAME
valueFrom:
configMapKeyRef:
key: CLOUD_SQL_CONNECTION_NAME
name: careerassessor-config
- name: CLOUD_SQL_CONNECTION_TIMEOUT
valueFrom:
configMapKeyRef:
key: CLOUD_SQL_CONNECTION_TIMEOUT
name: careerassessor-config
image: gcr.io/cloudsql-docker/gce-proxy:1.17
imagePullPolicy: IfNotPresent
name: cloudsql-proxy
resources:
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 20m
memory: 20Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /secrets/cloudsql
name: careerassessor-cloudsql-instance-credentials
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-xpdwd
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: gke-cluster-reliable-preemptible-node-4b42c9be-x9qs
nodeSelector:
non-preemptible: "false"
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: non-preemptible
operator: Equal
value: "false"
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: careerassessor-cloudsql-instance-credentials
secret:
defaultMode: 420
secretName: careerassessor-cloudsql-instance-credentials
- name: default-token-xpdwd
secret:
defaultMode: 420
secretName: default-token-xpdwd
status:
message: Node is shutting, evicting pods
phase: Failed
reason: Shutdown
startTime: "2021-08-10T11:14:30Z"
description of node
Name: gke-cluster-reliable-preemptible-node-xxxxx
Roles: <none>
Labels: beta.kubernetes.io/arch=xxx
beta.kubernetes.io/instance-type=xxx
beta.kubernetes.io/os=linux
cloud.google.com/gke-boot-disk=pd-standard
cloud.google.com/gke-container-runtime=containerd
cloud.google.com/gke-nodepool=preemptible-nodepool
cloud.google.com/gke-os-distribution=cos
cloud.google.com/gke-preemptible=true
cloud.google.com/machine-family=n1
failure-domain.beta.kubernetes.io/region=us-central1
failure-domain.beta.kubernetes.io/zone=us-central1-a
kubernetes.io/arch=amd64
kubernetes.io/hostname=gke-cluster-reliable-preemptible-node-xxxx
kubernetes.io/os=linux
node.kubernetes.io/instance-type=n1-standard-4
non-preemptible=false
topology.gke.io/zone=us-central1-a
topology.kubernetes.io/region=us-central1
topology.kubernetes.io/zone=us-central1-a
Annotations: container.googleapis.com/instance_id: 7488269578212988511
csi.volume.kubernetes.io/nodeid:
{"pd.csi.storage.gke.io":"projects/playground-206205/zones/us-central1-a/instances/gke-cluster-reliable-preemptible-node-4b42c9be-x9qs"}
node.alpha.kubernetes.io/ttl: 0
node.gke.io/last-applied-node-labels:
cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-container-runtime=containerd,cloud.google.com/gke-nodepool=preemptible-nod...
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 10 Aug 2021 17:24:03 +0530
Taints: non-preemptible=false:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: gke-cluster-reliable-preemptible-node-4b42c9be-x9qs
AcquireTime: <unset>
RenewTime: Tue, 10 Aug 2021 20:27:03 +0530
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
FrequentDockerRestart False Tue, 10 Aug 2021 20:24:28 +0530 Tue, 10 Aug 2021 17:24:08 +0530 NoFrequentDockerRestart docker is functioning properly
FrequentContainerdRestart False Tue, 10 Aug 2021 20:24:28 +0530 Tue, 10 Aug 2021 17:24:08 +0530 NoFrequentContainerdRestart containerd is functioning properly
FrequentUnregisterNetDevice False Tue, 10 Aug 2021 20:24:28 +0530 Tue, 10 Aug 2021 17:24:08 +0530 NoFrequentUnregisterNetDevice node is functioning properly
CorruptDockerOverlay2 False Tue, 10 Aug 2021 20:24:28 +0530 Tue, 10 Aug 2021 17:24:08 +0530 NoCorruptDockerOverlay2 docker overlay2 is functioning properly
KernelDeadlock False Tue, 10 Aug 2021 20:24:28 +0530 Tue, 10 Aug 2021 17:24:08 +0530 KernelHasNoDeadlock kernel has no deadlock
ReadonlyFilesystem False Tue, 10 Aug 2021 20:24:28 +0530 Tue, 10 Aug 2021 17:24:08 +0530 FilesystemIsNotReadOnly Filesystem is not read-only
FrequentKubeletRestart False Tue, 10 Aug 2021 20:24:28 +0530 Tue, 10 Aug 2021 17:24:08 +0530 NoFrequentKubeletRestart kubelet is functioning properly
NetworkUnavailable False Tue, 10 Aug 2021 17:24:03 +0530 Tue, 10 Aug 2021 17:24:03 +0530 RouteCreated NodeController create implicit route
MemoryPressure False Tue, 10 Aug 2021 20:26:16 +0530 Tue, 10 Aug 2021 17:24:00 +0530 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 10 Aug 2021 20:26:16 +0530 Tue, 10 Aug 2021 17:24:00 +0530 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 10 Aug 2021 20:26:16 +0530 Tue, 10 Aug 2021 17:24:00 +0530 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 10 Aug 2021 20:26:16 +0530 Tue, 10 Aug 2021 17:24:03 +0530 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.128.0.100
ExternalIP: 34.133.49.148
InternalDNS: gke-cluster-reliable-preemptible-node-4b42c9be-x9qs.c.playground-206205.internal
Hostname: gke-cluster-reliable-preemptible-node-4b42c9be-x9qs.c.playground-206205.internal
Capacity:
attachable-volumes-gce-pd: 127
cpu: 4
Thanks for all the info. According to the documentation and assuming your GKE cluster is on 1.20 version:
On preemptible GKE nodes running versions 1.20 or later, the kubelet graceful node shutdown feature is enabled by default. As a result, kubelet detects preemption and gracefully terminates Pods.
For Pods on preemptible nodes, do not specify more than 25 seconds for terminationGracePeriodSeconds because those Pods will only receive 25 seconds during preemption.
The best way to use taint and toleration is using the default label created on preemptible VMs - Tainting a node for preemptible VMs:
kubectl taint nodes node-name cloud.google.com/gke-preemptible="true":NoSchedule
Add toleration to a Pod:
tolerations:
- key: cloud.google.com/gke-preemptible
operator: Equal
value: "true"
effect: NoSchedule
Also:
When the kubelet terminates Pods during preemptible node shutdown, it assigns a Failed status and a Shutdown reason to the Pods. These Pods are cleaned up during the next garbage collection. You can also delete shutdown Pods manually using the following command:
kubectl get pods --all-namespaces | grep -i shutdown | awk '{print $1, $2}' | xargs kubectl delete pod -n
Please review the full documentation which explains all the details: https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms