I'm using kube-aws
to set up a production Kubernetes cluster on AWS. Now I'm running into an issue I am unable to recreate in my dev environment.
When a pod is running heavy computation, which in my case happens in bursts, the liveness and readiness probes stop responding. In my local environment, where I run docker compose, they work as expected.
The probes use simple HTTP 204 No Content output using native Go functionality.
Has anyone seen this before?
EDIT:
I'd be happy to provide more information, but I am uncertain what I should provide as there is a tremendous amount of configuration and code involved. Primarily, I'm looking for help to troubleshoot and where to look to try to locate the actual issue.