Where does the job status in GKE come from? And why is it different than 'kubectl get job'

6/4/2020

The GKE UI shows a different status for my job than I get back from kubectl. Note that the GKE UI is the correct status AFAICT and kubectl is wrong. However, I want to programmatically get back the correct status using read_namespaced_job in the Python API, however that status matches kubectl, which seems to be the wrong status.

Where does this status in the GKE UI come from?

In GKE UI:

apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2020-06-04T08:00:06Z"
  labels:
    controller-uid: ee750648-1189-4ed5-9803-054d407aa0b2
    job-name: tf-nightly-transformer-translate-func-v2-32-1591257600
  name: tf-nightly-transformer-translate-func-v2-32-1591257600
  namespace: automated
  ownerReferences:
  - apiVersion: batch/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: CronJob
    name: tf-nightly-transformer-translate-func-v2-32
    uid: 5b619895-4c08-45e9-8981-fbd95980ff4e
  resourceVersion: "16109561"
  selfLink: /apis/batch/v1/namespaces/automated/jobs/tf-nightly-transformer-translate-func-v2-32-1591257600
  uid: ee750648-1189-4ed5-9803-054d407aa0b2
  
...

status:
  completionTime: "2020-06-04T08:41:41Z"
  conditions:
  - lastProbeTime: "2020-06-04T08:41:41Z"
    lastTransitionTime: "2020-06-04T08:41:41Z"
    status: "True"
    type: Complete
  startTime: "2020-06-04T08:00:06Z"
  succeeded: 1

From kubectl:

zcain@zcain:~$ kubectl get job tf-nightly-transformer-translate-func-v2-32-1591257600 --namespace=automated -o yaml
apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2020-06-04T08:00:27Z"
  labels:
    controller-uid: b5d4fb20-df8d-45d8-a8b5-e3b0c40999be
    job-name: tf-nightly-transformer-translate-func-v2-32-1591257600
  name: tf-nightly-transformer-translate-func-v2-32-1591257600
  namespace: automated
  ownerReferences:
  - apiVersion: batch/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: CronJob
    name: tf-nightly-transformer-translate-func-v2-32
    uid: 51a40f4a-5595-49a1-b63f-db75b0849206
  resourceVersion: "32712722"
  selfLink: /apis/batch/v1/namespaces/automated/jobs/tf-nightly-transformer-translate-func-v2-32-1591257600
  uid: b5d4fb20-df8d-45d8-a8b5-e3b0c40999be

...

status:
  conditions:
  - lastProbeTime: "2020-06-04T12:04:58Z"
    lastTransitionTime: "2020-06-04T12:04:58Z"
    message: Job was active longer than specified deadline
    reason: DeadlineExceeded
    status: "True"
    type: Failed
  startTime: "2020-06-04T11:04:58Z"[enter image description here][1]

Environment:

Kubernetes version (kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"clean", BuildDate:"2020-01-18T23:33:14Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.9-gke.26", GitCommit:"525ce678faa2b28483fa9569757a61f92b7b0988", GitTreeState:"clean", BuildDate:"2020-03-06T18:47:39Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

OS:
cat /etc/os-release PRETTY_NAME="Debian GNU/Linux rodete"

Python version (python --version):
Python 3.7.7

Python client version (pip list | grep kubernetes):
kubernetes 10.0.1
-- Zachary Cain
google-kubernetes-engine
kubernetes

1 Answer

6/5/2020

For anyone else who finds a similar issue: The problem is with the kubeconfig file (/usr/local/google/home/zcain/.kube/config for me)

There is a line in here like this: current-context: gke_xl-ml-test_europe-west4-a_xl-ml-test-europe-west4

If the current-context is pointing to a different cluster or zone than where your job ran, then when you run kubectl job get or use the Python API, then the job status you get back will be weird. I feel like it should just error out but instead I got the behavior above where I get back an incorrect status.

You can run something like gcloud container clusters get-credentials xl-ml-test-europe-west4 --zone europe-west4-a to set your kubeconfig to the correct current-context

-- Zachary Cain
Source: StackOverflow