GKE Pods not viewable after cluster downsizes

1/5/2020

I'm running parallel jobs with expansions on an auto-scaled cluster. When a pod's node is still running, I can view the pod in the "Workloads" section of "Kubernetes Engine". But if the cluster downsizes due to lack of work, the pods associated the removed nodes disappear from that view (and also from access via CLI kubectl get pods).

Is there any way to keep this information from disappearing? It would be very useful to know the success/failure status, and easily access the logs.

-- Uric Sou
google-kubernetes-engine

1 Answer

1/6/2020

I have found the document about running the jobs in GKE and I think you can inspect the job with the command kubectl describe job [JobName] and observe the events, even after the node is deleted (due to auto scaling)

Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-4fkr2
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-fvr9n
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-jwjgz
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-ws4t7
  Normal  SuccessfulCreate  16m   job-controller  Created pod: [JobName]-jjjdl

the alternative is, if you have enabled the Stackdriver logging i.e. Stackdriver support for GKE, specially the Stackdriver Kubernetes Engine Monitoring as Legacy Stackdriver support is deprecating. With below filter[1] in the advance log queries you can inspect the logs for the pods under your jobs.

[1]

resource.type="container"
resource.labels.cluster_name="[ClusterName]"
resource.labels.namespace_id="[Namespace]"
resource.labels.project_id="[ProjectID]"
resource.labels.zone:"[ZONE]"
resource.labels.container_name="[ContainerName]"
resource.labels.pod_id:"[JobName]-"
-- Muss Rahman
Source: StackOverflow