php-fpm container livenessProbe with /ping route

12/20/2017

Lately we've been running into some issues with our php-fpm processes spinning out of control and causing the site to become unresponsive. There's some obvious php-fpm configuration tooling that needs to be done, but I'd also like to implement a reasonable livenessProbe health check for the php-fpm container that will restart the container when the probe fails.

I've dug up several resources on how to ping the server as a health check (e.g. https://easyengine.io/tutorials/php/fpm-status-page/), but I have yet to find a good answer on what to be on the lookout for. Will the /ping route return something other than 'pong' if the server is effectively dead? Will it just time out? Assuming the latter, what is a reasonable timeout limit?

Running some tests of my own, I notice that a healthy php-fpm server will return the 'pong' response quickly:

# time curl localhost/ping
pong
real    0m0.040s
user    0m0.006s
sys 0m0.001s

I simulated heavy load and indeed it took 1-3 seconds for the 'pong' response, and that coincided with the site becoming unresponsive. Based on that I drew up a draft of a livenessProbe that will fail and restart the container if the liveness probe script takes longer than 2 seconds on 2 consecutive probes:

livenessProbe:
  exec:
    command:
    - sh
    - -c
    - timeout 2 /var/www/livenessprobe.sh
  initialDelaySeconds: 15
  periodSeconds: 3
  successThreshold: 1
  failureThreshold: 2

And the probe script is simply this (There are reasons why this needs to be a shell script and not a direct httpGet from the livenessProbe that I won't get into):

  #!/bin/bash

  curl -s localhost/ping

Now I don't know if I'm being too aggressive or too conservative. I'll be running a canary deploy to test this, but in the meantime I'd like to get some feedback from others that have implemented health checks on php-fpm servers, bonus points if it's in a Kubernetes context.

-- erstaples
fpm
kubernetes
php

0 Answers