kubectl status.phase=Running return wrong results

8/11/2021

When I run:

kubectl get pods --field-selector=status.phase=Running

I see:

NAME          READY   STATUS    RESTARTS   AGE
k8s-fbd7b     2/2     Running   0          5m5s
testm-45gfg   1/2     Error     0          22h

I don't understand why this command gives me pod that are in Error status? According to K8S api, there is no such thing STATUS=Error.

How can I get only the pods that are in this Error status?

When I run:

kubectl get pods --field-selector=status.phase=Failed

It tells me that there are no pods in that status.

-- Slava
kubectl
kubernetes

3 Answers

8/11/2021

Using the kubectl get pods --field-selector=status.phase=Failed command you can display all Pods in the Failed phase.

Failed means that all containers in the Pod have terminated, and at least one container has terminated in failure (see: Pod phase):

Failed - All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system.

In your example, both Pods are in the Running phase because at least one container is still running in each of these Pods.:

Running - The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting.

You can check the current phase of Pods using the following command:

$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'

Let's check how this command works:

$ kubectl get pods
NAME    READY   STATUS   
app-1   1/2     Error   
app-2   0/1     Error   

$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
app-1   Running
app-2   Failed

As you can see, only the app-2 Pod is in the Failed phase. There is still one container running in the app-1 Pod, so this Pod is in the Running phase.

To list all pods with the Error status, you can simply use:

$ kubectl get pods -A | grep Error
default       app-1   1/2     Error     
default       app-2   0/1     Error

Additionally, it's worth mentioning that you can check the state of all containers in Pods:

$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[*].state}{"\n"}{end}'
app-1   {"terminated":{"containerID":"containerd://f208e2a1ff08c5ce2acf3a33da05603c1947107e398d2f5fbf6f35d8b273ac71","exitCode":2,"finishedAt":"2021-08-11T14:07:21Z","reason":"Error","startedAt":"2021-08-11T14:07:21Z"}} {"running":{"startedAt":"2021-08-11T14:07:21Z"}}
app-2   {"terminated":{"containerID":"containerd://7a66cbbf73985efaaf348ec2f7a14d8e5bf22f891bd655c4b64692005eb0439b","exitCode":2,"finishedAt":"2021-08-11T14:08:50Z","reason":"Error","startedAt":"2021-08-11T14:08:50Z"}}
-- matt_j
Source: StackOverflow

8/11/2021

You can simply grep the Error pods using the

kubectl get pods --all-namespces | grep Error

Remove all error pods from the cluster

kubectl delete pod `kubectl get pods --namespace <yournamespace> | awk '$3 == "Error" {print $1}'` --namespace <yournamespace>

Mostly Pod failures return explicit error states that can be observed in the status field

Error :

Your pod is crashed, it was able to schedule on node successfully but crashed after that. To debug it more you can use different methods or commands

kubectl describe pod <Pod name > -n <Namespace>

https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/#my-pod-is-crashing-or-otherwise-unhealthy

-- Harsh Manvar
Source: StackOverflow

8/12/2021

Here is an overkill go-template based attempt:

kubectl  get pods -o go-template='{{range $index, $element := .items}}{{range .status.containerStatuses}}{{range .state }}{{if .reason }}{{if (eq  .reason "Error") }}{{$element.metadata.name}} {{$element.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}{{end}}{{end}}'
job1-stn45 default

My pod status:

k get pod
NAME                         READY   STATUS             RESTARTS   AGE
foo                          1/1     Running            1          2d11h
nginx-0                      1/1     Running            3          5d10h
nginx-2                      1/1     Running            3          5d10h
nginx-1                      1/1     Running            3          5d10h
job1-stn45                   0/1     Error              0          113m
update-test-27145740-82z7s   0/1     ImagePullBackOff   0          96m
update-test-27145500-7f2l9   0/1     ImagePullBackOff   0          5h36m
-- P....
Source: StackOverflow