I have HPA for my Kubernetes-deployed app with cluster autoscaler. Scaling works properly for both pods and nodes, but during production load spikes I see a lot of 502 errors from ALB (aws-load-balancer-controller).
It seems like I have enabled everything to achieve zero-downtime deployment / scaling:
readinessProbe:
httpGet:
path: /_healthcheck/
port: 80
* pod readiness gate [is enabled](https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/deploy/pod_readiness_gate/)
* ingress annotation uses `ip` target type
alb.ingress.kubernetes.io/target-type: ip
* healthcheck parameters are specified on the ingress resource
alb.ingress.kubernetes.io/healthcheck-path: "/healthcheck/" alb.ingress.kubernetes.io/healthcheck-interval-seconds: "10"
but that doesn't help.
How to properly debug this kind of issue and which other parameters should I tune to completely eliminate 5xx errors from my load balancer?
Here's a list of some extra things that I've added to my configuration alongside those mentioned above
preStop
hooklifecycle:
preStop:
exec:
command: ["/bin/sleep", "30"]
termination grace period on a pod terminationGracePeriodSeconds: 40
(sleep time from the above + 10-15 seconds)
tune deregistration delay value on a target group by setting
alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
this annotation on an ingress resources. Usually the value should match your timeout on backend webserver (we don't want to have a target around more than it requires for the longest possible request to finish).
The main idea behind this tuning is to make sure changes of the Pods state have enough time to propagate to the underlying AWS resources, so traffic is no longer routed from ALB to the pod within target group that has been already marked as terminated/unhealthy by k8s.
P.S. Make sure to always have enough pods to handle incoming requests (this is especially important for synchronous workers when doing rolling redeploy). Consider lower values for maxUnavailable
and higher values for maxSurge
in case your cluster/worker nodes have the capacity to allocate these extra pods. So if your pod handles 100 reqs/min on average on your load is 400 reqs/min make sure num of replicas
- maxUnavailable
> 4 (total reqs / reqs per pod)