The GKE UI shows a different status for my job than I get back from kubectl
. Note that the GKE UI is the correct status AFAICT and kubectl
is wrong. However, I want to programmatically get back the correct status using read_namespaced_job in the Python API, however that status matches kubectl
, which seems to be the wrong status.
Where does this status in the GKE UI come from?
In GKE UI:
apiVersion: batch/v1
kind: Job
metadata:
creationTimestamp: "2020-06-04T08:00:06Z"
labels:
controller-uid: ee750648-1189-4ed5-9803-054d407aa0b2
job-name: tf-nightly-transformer-translate-func-v2-32-1591257600
name: tf-nightly-transformer-translate-func-v2-32-1591257600
namespace: automated
ownerReferences:
- apiVersion: batch/v1beta1
blockOwnerDeletion: true
controller: true
kind: CronJob
name: tf-nightly-transformer-translate-func-v2-32
uid: 5b619895-4c08-45e9-8981-fbd95980ff4e
resourceVersion: "16109561"
selfLink: /apis/batch/v1/namespaces/automated/jobs/tf-nightly-transformer-translate-func-v2-32-1591257600
uid: ee750648-1189-4ed5-9803-054d407aa0b2
...
status:
completionTime: "2020-06-04T08:41:41Z"
conditions:
- lastProbeTime: "2020-06-04T08:41:41Z"
lastTransitionTime: "2020-06-04T08:41:41Z"
status: "True"
type: Complete
startTime: "2020-06-04T08:00:06Z"
succeeded: 1
From kubectl
:
zcain@zcain:~$ kubectl get job tf-nightly-transformer-translate-func-v2-32-1591257600 --namespace=automated -o yaml
apiVersion: batch/v1
kind: Job
metadata:
creationTimestamp: "2020-06-04T08:00:27Z"
labels:
controller-uid: b5d4fb20-df8d-45d8-a8b5-e3b0c40999be
job-name: tf-nightly-transformer-translate-func-v2-32-1591257600
name: tf-nightly-transformer-translate-func-v2-32-1591257600
namespace: automated
ownerReferences:
- apiVersion: batch/v1beta1
blockOwnerDeletion: true
controller: true
kind: CronJob
name: tf-nightly-transformer-translate-func-v2-32
uid: 51a40f4a-5595-49a1-b63f-db75b0849206
resourceVersion: "32712722"
selfLink: /apis/batch/v1/namespaces/automated/jobs/tf-nightly-transformer-translate-func-v2-32-1591257600
uid: b5d4fb20-df8d-45d8-a8b5-e3b0c40999be
...
status:
conditions:
- lastProbeTime: "2020-06-04T12:04:58Z"
lastTransitionTime: "2020-06-04T12:04:58Z"
message: Job was active longer than specified deadline
reason: DeadlineExceeded
status: "True"
type: Failed
startTime: "2020-06-04T11:04:58Z"[enter image description here][1]
Environment:
Kubernetes version (kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"clean", BuildDate:"2020-01-18T23:33:14Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.9-gke.26", GitCommit:"525ce678faa2b28483fa9569757a61f92b7b0988", GitTreeState:"clean", BuildDate:"2020-03-06T18:47:39Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}
OS:
cat /etc/os-release PRETTY_NAME="Debian GNU/Linux rodete"
Python version (python --version):
Python 3.7.7
Python client version (pip list | grep kubernetes):
kubernetes 10.0.1
For anyone else who finds a similar issue:
The problem is with the kubeconfig
file (/usr/local/google/home/zcain/.kube/config for me)
There is a line in here like this:
current-context: gke_xl-ml-test_europe-west4-a_xl-ml-test-europe-west4
If the current-context
is pointing to a different cluster or zone than where your job ran, then when you run kubectl job get
or use the Python API, then the job status you get back will be weird. I feel like it should just error out but instead I got the behavior above where I get back an incorrect status.
You can run something like gcloud container clusters get-credentials xl-ml-test-europe-west4 --zone europe-west4-a
to set your kubeconfig
to the correct current-context