Managing the health and well being of multiple pods with dependencies

5/5/2019

We have several pods (as service/deployments) in our k8s workflow that are dependent on each other, such that if one goes into a CrashLoopBackOff state, then all these services need to be redeployed.

Instead of having to manually do this, is there a programatic way of handling this?

Of course we are trying to figure out why the pod in question is crashing.

-- horcle_buzz
error-handling
kubernetes

2 Answers

5/5/2019

First thing to do is making sure that pods are started in correct sequence. This can be done using initContainers like that:

spec:
  initContainers:
  - name: waitfor
    image: jwilder/dockerize
    args:
    - -wait
    - "http://config-srv/actuator/health"
    - -wait
    - "http://registry-srv/actuator/health"
    - -wait
    - "http://rabbitmq:15672"
    - -timeout
    - 600s

Here your pod will not start until all the services in a list are responding to HTTP probes.

Next thing you may want to define liveness probe that periodically executes curl to the same services

  spec:
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -c
        - curl http://config-srv/actuator/health &&
          curl http://registry-srv/actuator/health &&
          curl http://rabbitmq:15672

Now if any of those services fail - you pod will fail liveness probe, be restarted and wait for services to become back online.

That's just an example how it can be done. In your case checks can be different of course.

-- Vasily Angapov
Source: StackOverflow

5/5/2019

If these are so tightly dependant on each other, I would consider these options a) Rearchitect your system to be more resilient towards failure and tolerate, if a pod is temporary unavailable b) Put all parts into one pod as separate containers, making the atomic design more explicit

If these don't fit your needs, you can use the Kubernetes API to create a program that automates the task of restarting all dependent parts. There are client libraries for multiple languages and integration is quite easy. The next step would be a custom resource definition (CRD) so you can manage your own system using an extension to the Kubernetes API.

-- Thomas
Source: StackOverflow