I create a Pod with Replica count of say 2, which runs an application ( a simple web-server), basically it's always running command - However due to mis-configuration, sometimes the command exits and the pod is then Terminated.
Due to default restartPolicy
of Always
the pod (and hence the container) is restarted and eventually the Pod status is CrashLoopBackOff
.
If I do a kubectl describe deployment
, it shows Condition as Progressing=True
and Available=False
.
This looks fine - the question is - how do I mark my deployment as 'failed' in the above case?
Adding a spec.ProgressDeadlineSeconds
doesn't seem to be having an effect.
Will simply saying restartPolicy
as Never
be enough in the Pod specification?
A related question, is there a way of getting this information as a trigger/webhook, without doing a rollout status
watch?
There is no Kubernetes concept for a "failed" deployment. Editing a deployment registers your intent that the new ReplicaSet is to be created, and k8s will repeatedly try to make that intent happen. Any errors that are hit along the way will cause the rollout to block, but they will not cause k8s to abort the deployment.
AFAIK, the best you can do (as of 1.9) is to apply a deadline to the Deployment, which will add a Condition that you can detect when a deployment gets stuck; see https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment and https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#progress-deadline-seconds.
It's possible to overlay your own definitions of failure on top of the statuses that k8s provides, but this is quite difficult to do in a generic way; see this issue for a (long!) discussion on the current status of this: https://github.com/kubernetes/kubernetes/issues/1899
Here's some Python code (using pykube
) that I wrote a while ago that implements my own definition of ready; I abort my deploy script if this condition does not obtain after 5 minutes.
def _is_deployment_ready(d, deployment):
if not deployment.ready:
_log.debug('Deployment not completed.')
return False
if deployment.obj["status"]["replicas"] > deployment.replicas:
_log.debug('Old replicas not terminated.')
return False
selector = deployment.obj['spec']['selector']['matchLabels']
pods = Pod.objects(d.api).filter(namespace=d.namespace, selector=selector)
if not pods:
_log.info('No pods found.')
return False
for pod in pods:
_log.info('Is pod %s ready? %s.', pod.name, pod.ready)
if not pod.ready:
_log.debug('Pod status: %s', pod.obj['status'])
return False
_log.info('All pods ready.')
return True
Note the individual pod check, which is required because a deployment seems to be deemed 'ready' when the rollout completes (i.e. all pods are created), not when all of the pods are ready.