Kubernetes liveness probes for redis queue workers

10/6/2019

I have a kubernetes cluster with a few different pod types.

  • An Nginx frontend,
  • A flask backend on gunicorn,
  • Redis, and
  • A Redis queue (RQ).

Based on what I can tell, the default liveness probing for the frontend, and flask backend are sufficient (200 OK returning, as I have created a '/' backend that returns 200 and all my tasks should run quickly). Crash detection works well.

Additionally, I have a setup aliveness monitor that pings Redis with the Redis-cli. That also is working well.

However, I am not sure if the default configuration for the RQ is good enough. The pod has restarted itself a few times and is generally well behaved, but since I don't know the mechanism that is used, I'm worried.

My questions are: what is the liveness probe used by something like an RQ worker and what might I do to make sure it's robust?

Should I be using something like Supervisor or systemd? Any recommendations on which one?

-- JonathanC
google-kubernetes-engine
kubernetes
redis

2 Answers

10/7/2019

From the stable redis helm chart, https://github.com/helm/charts/blob/master/stable/redis/templates/health-configmap.yaml

Bitnami has added health check for redis in the helm chart. It just pings the cluster using redis-cli. This mechanism seems to be doing well enough to be included in the official manifest.

-- Palash Goel
Source: StackOverflow

10/14/2019

It would appear that RQ sets a heartbeat key in Redis: https://github.com/rq/rq/blob/e43bce4467c3e1800d75d9cedf75ab6e7e01fe8c/rq/worker.py#L545-L561

You could check if this exists somehow. This would probably require an exec probe though, and at this time I wouldn't recommend that as exec probes have several open bugs that cause zombie processes leading to escalating resource usage over time.

-- coderanger
Source: StackOverflow