kubernetes: precise set of conditions to determine failed container

8/30/2018

I'm looking to put together a precise set of conditions that indicate that a container has failed.

1) Readiness check is not enough -- a container could be in init or container creating state, and, thus, healthy.

2) Terminating state is not enough either -- a container could be terminating as part of a rolling reboot. In addition, the reason for the terminating state is of type 'string', so could be anything, and I haven't been able to find an exhaustive list.

3) Waiting state is not enough either - it has the same problem that the reason is a string, and there is no exhaustive list of possibilities. Though, checking for ["CrashLoopBackOff", "ErrImagePull", "ImagePullBackOff"] does give a definitive answer that the container did indeed fail.

The use case, if that's important, is that I want to notify the user if the deployment/statefulset is in a failing state, but avoid false-positives on creation and rolling reboots. I also consider the case when rolling reboot results in CrashLoopBackOff of new containers a failure. Thus, I'm looking to containers to build out the healthiness status of the deployments/statefulsets.

-- Lana Nova
kubernetes
pod

1 Answer

8/30/2018

First of all I'm assuming 2 things:

  1. By container you actually are referring to a pod with a single container in it.
  2. Your use case allows you to detect status externally

In this case, you have several options for determining the state of failed pods from outside of the pod/container itself.

  1. use Kubectl to get a list of failed containers

Get failed containers from all namespaces:

kubectl get pods --field-selector=status.phase=Failed --all-namespaces

  1. Use kubectl to Detect the state of pod

Simple: kubectl get pod mypod | tail -n 1 | awk '{ print $3 }'

Example Output:

"Failed"
"Running"

The Simple command can be used to do a simple bash comparison such as:

if [ $(`kubectl get pod mypod | tail -n 1 | awk '{ print $3 }'`) == "Failed" ]; 
    then echo "Container has failed!"; 
fi

Read more about pod lifecycles

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/

-- yosefrow
Source: StackOverflow