I am currently working on a monitoring service that will monitor Kubernetes' deployments and their pods. I want to notify users when a deployment is not running the expected amount of replicas and also when pods' containers restart unexpectedly. This may not be the right things to monitor and I would greatly appreciate some feedback on what I should be monitoring.
Anyways, the main question is the differences between all of the Statuses of pods. And when I say Statuses I mean the Status column when running kubectl get pods
. The statuses in question are:
- ContainerCreating
- ImagePullBackOff
- Pending
- CrashLoopBackOff
- Error
- Running
What causes pod/containers to go into these states?
For the first four Statuses, are these states recoverable without user interaction?
What is the threshold for a CrashLoopBackOff
?
Is Running
the only status that has a Ready Condition
of True?
Any feedback would be greatly appreciated!
Also, would it be bad practice to use kubectl
in an automated script for monitoring purposes? For example, every minute log the results of kubectl get pods
to Elasticsearch?
I will try to tell what I see hidden behind these terms
Showing when we wait to image be downloaded and the container will be created by a docker or another system.
Showing when we have problem to download the image from a registry. Wrong credentials to log in to the docker hub for example.
The container starts (if start take time) or started but redinessProbe failed.
This status showing when container restarts occur too much often. For example, we have process that tries to read not exists file and crash. Then the container will be recreated by Kube and repeat.
This is pretty clear. We have some errors to run the container.
All is good container running and livenessProbe is OK.
You can see the pod lifecycle details in k8s documentation. The recommended way of monitoring kubernetes cluster and applications are with prometheus