Pods "Pending" on Google Kubernetes Engine because of new "priority" field

5/13/2019

I have a Kubernetes cluster on GKE that's managed by Helm, and recently deploys started failing because pods would not leave the Pending state. Inspecting the pending pods I see:

Events:
  Type     Reason            Age                From                                                Message
  ----     ------            ----               ----                                                -------
  Normal   Scheduled         50m                default-scheduler                                   Successfully assigned default/my-pod-6cbfb94cb-4pzl9 to gke-staging-pool-43f2e11c-nzjz
  Warning  FailedValidation  50m (x6 over 50m)  kubelet, gke-staging-pool-43f2e11c-nzjz  Error validating pod my-pod-6cbfb94cb-4pzl9_default(8e4dab93-75a7-11e9-80e1-42010a960181) from api, ignoring: spec.priority: Forbidden: Pod priority is disabled by feature-gate

And specifically, this warning seems relevant: Error validating pod my-pod-6cbfb94cb-4pzl9_default(8e4dab93-75a7-11e9-80e1-42010a960181) from api, ignoring: spec.priority: Forbidden: Pod priority is disabled by feature-gate

Comparing the new, Pending, pods with currently running pods, the only difference I see (apart from timestamps, etc) is:

$ kubectl get pod my-pod-6cbfb94cb-4pzl9 -o yaml > /tmp/pending-pod.yaml
$ kubectl get pod my-pod-7958cc964-64wsd -o yaml > /tmp/running-pod.yaml
$ diff /tmp/running-pod.yaml /tmp/pending-pod.yaml
@@ -58,7 +58,8 @@
       name: default-token-wnhwl
       readOnly: true
   dnsPolicy: ClusterFirst
-  nodeName: gke-staging-pool-43f2e11c-r4f9
+  nodeName: gke-staging-pool-43f2e11c-nzjz
+  priority: 0 // <-- notice that the `priority: 0` field is added
   restartPolicy: Always
   schedulerName: default-scheduler

It appears that this started happening some time between May 1st, 2019, and May 6th 2019.

The cluster I've used as an example here is a staging cluster, but I've noticed the same behavior on two similarly configured production clusters, which leads me to believe there was a change on the Google Kube side.

Pods are deployed by Helm through cloudbuild.yaml, and nothing has changed in that setup (either version of Helm, or cloudbuild file) between May 1st and May 6th when the regression seems to have been introduced.

Helm version:

Client: &version.Version{SemVer:"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"}
-- David Wolever
google-kubernetes-engine
kubernetes

1 Answer

5/14/2019

If you see the docs for Pod priority and Preemption the feature is alpha in <= 1.10 (disabled by default, enabled by a feature gate which GKE doesn't do on the control plane, afaik) then it became beta in >= 1.11 (enabled by default).

Could be either or combination:

  1. You have a GKE control plane >= 1.11 and the node where your Pod is trying to start (being scheduled by the kube-scheduler) is running a kubelet <= 1.10. Somebody could have upgraded the control plane without upgrading the nodes (or instance group)

  2. Someone upgraded the control plane to 1.11 or later and the priority admission controller is enabled by default and it's preventing Pods with the spec.priority field set from starting (or restarting). If you see the API docs (priority field) it says that when the priority admission controller is enabled you can't set that field and that field can only be set by a PriorityClass/PriorityClassName.

-- Rico
Source: StackOverflow