I have a Kubernetes cluster on AWS, set up with kops
.
I set up a Deployment that runs an Apache container and a Service for the Deployment (type: LoadBalancer).
When I update the deployment by running kubectl set image ...
, as soon as the first pod of the new ReplicaSet
becomes ready, the first couple of requests to the service time out.
Things I have tried:
readinessProbe
on the pod, works.curl localhost
on a pod, works.curl
the IP returned by that DNS lookup inside a pod, the first request will timeout. This tells me it's not an ELB issue.It's really frustrating since otherwise our Kubernetes stack is working great, but every time we deploy our application we run the risk of a user timing out on a request.
After a lot of debugging, I think I've solved this issue.
TL;DR; Apache has to exit gracefully.
curl
on the pod IP and node IPs, worked normally.externalName
selector-less service for a couple of external dependencies, thinking it might have something to do with DNS lookups, didn't help.I set up a preStop
lifecycle hook on the pod to gracefully terminate Apache to run apachectl -k graceful-stop
The issue (at least from what I can tell), is that when pods are taken down on a deployment, they receive a TERM
signal, which causes apache to immediately kill all of its children. This might cause a race condition where kube-proxy
still sends some traffic to pods that have received a TERM
signal but not terminated completely.
Also got some help from this blog post on how to set up the hook.
I also recommend increasing the terminationGracePeriodSeconds
in the PodSpec so apache has enough time to exit gracefully.