I'm looking to put together a precise set of conditions that indicate that a container has failed.
1) Readiness check is not enough -- a container could be in init or container creating state, and, thus, healthy.
2) Terminating state is not enough either -- a container could be terminating as part of a rolling reboot. In addition, the reason for the terminating state is of type 'string', so could be anything, and I haven't been able to find an exhaustive list.
3) Waiting state is not enough either - it has the same problem that the reason is a string, and there is no exhaustive list of possibilities. Though, checking for ["CrashLoopBackOff", "ErrImagePull", "ImagePullBackOff"] does give a definitive answer that the container did indeed fail.
The use case, if that's important, is that I want to notify the user if the deployment/statefulset is in a failing state, but avoid false-positives on creation and rolling reboots. I also consider the case when rolling reboot results in CrashLoopBackOff of new containers a failure. Thus, I'm looking to containers to build out the healthiness status of the deployments/statefulsets.
First of all I'm assuming 2 things:
In this case, you have several options for determining the state of failed pods from outside of the pod/container itself.
Get failed containers from all namespaces:
kubectl get pods --field-selector=status.phase=Failed --all-namespaces
Simple: kubectl get pod mypod | tail -n 1 | awk '{ print $3 }'
Example Output:
"Failed"
"Running"
The Simple command can be used to do a simple bash comparison such as:
if [ $(`kubectl get pod mypod | tail -n 1 | awk '{ print $3 }'`) == "Failed" ];
then echo "Container has failed!";
fi
Read more about pod lifecycles
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/