How do I restart container for which site returns 200 in liveness probe check?

8/25/2019

I have a site hosted in Kubernetes which always returns a HTTP 200 response even when it fails to pull configuration values from a configuration repository that is hosted elsewhere. What happens is that the nodes on which the container is hosted reboot after OS patching while the configuration repo nodes is still being rebooted. Container nodes come up first, containers start up but fail to get the configuration values. The site always returns 200 with a blank page. Therefore, liveness probe using GET doesn't see an issue and container is not restarted later, failing to get the config values once cnfg repo node is up. Is there a custom liveness probe I can write which continues to restart the container until it successfully gets the config from the repo once config repo node comes back online?

I tried setting up a readiness probe but it functions the same way as site continues to respond with 200 code even when it can't launch due to config being absent.

-- user2425909
kubernetes

2 Answers

8/25/2019

Yes you can define a command based liveness probe, which you should implement yourself.

-- Akın Özer
Source: StackOverflow

8/26/2019

This already was mentioned by @Akın Özer, you can use Liveness command. And for example cat the configuration file that you are loading, this might look like the following:

    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/config-file-repo
      initialDelaySeconds: 5
      periodSeconds: 5

The periodSeconds field specifies that the kubelet should perform a liveness probe every 5 seconds. The initialDelaySeconds field tells the kubelet that it should wait 5 second before performing the first probe.

You can use this with Lifecycle Hooks. To be more exact:

PostStart

This hook executes immediately after a container is created. However, there is no guarantee that the hook will execute before the container ENTRYPOINT. No parameters are passed to the handler.

You can check if container is able to load configuration repo and if it is, create empty file /tmp/config-file-repo. This way your liveness probe will know if container should be rescheduled or not.

An example for postStart might be:

  lifecycle:
    postStart:
      exec:
        command:
          - "sh"
          - "-c"
          - >
            if curl --fail -X GET http://configuration_repo_nodes ;then
            touch /tmp/config-file-repo;
            else
            sleep 60;
            fi

This checks if configuration_repo_nodes is accessible and creates file /tmp/config-file-repo, if inaccessible sleep 60. You can write something else instead of that.

-- Crou
Source: StackOverflow