Microservice liveness and readiness timeout handling

4/17/2020

I have pods deployed in kubernetes cluster with serves some HTTPS requests. I am doing load testing for the apis with concurrent users per second.

While I am doing the load tests container is getting killed due to liveness and readiness failures and pods getting re-deployed. Due to this, my APIs are facing failures.

liveness:
  initialDelaySeconds: 60
  periodSeconds: 20
  timeoutSeconds: 60
  successThreshold: 1
  failureThreshold: 4

readiness:
  initialDelaySeconds: 60
  periodSeconds: 20
  timeoutSeconds: 60
  successThreshold: 1
  failureThreshold: 4

livenessProbe:
   httpGet:
    path: /health
    port: 8000
    scheme: HTTPS
readinessProbe            
   httpGet:
    path: /health
    port: 8000
    scheme: HTTPS

How can I avoid these failures? Is it due to my application couldn't serve the liveness requests?

-- Gopi
kubernetes
kubernetes-pod

1 Answer

4/20/2020

You would need to be more specific what load tests are you exactly doing. Which tools are you using.

Please check the pod description for event logs that lead to pod being rescheduled, this can be done by kubectl describe pod <pod_name>. Maybe your pod is running low on memory or cpu so you might look into requests and limits in Managing Compute Resources for Containers.

You mentioned that the readiness and liveness probes are failing while you are running the test, that would indicate that you are stressing http/https to the limit it's no longer able to serve requests.

To perform a probe, the kubelet sends an HTTPS GET request to the server that is running in the container and listening on port 8000. If the handler for the server’s /health path returns a success code, the kubelet considers the container to be alive and healthy. If the handler returns a failure code, the kubelet kills the container and restarts it.

Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure.

To mitigate those failures you could increase the timeoutSeconds and/or failureThreshold, but I think you should work on the process that is serving the requests.

Also I do recommend reading this great guide Kubernetes best practices: Setting up health checks with readiness and liveness probes.

-- Crou
Source: StackOverflow