I have an application service running with 2 pods that are receiving traffic from another service in the Kubernetes cluster.
I am facing an issue where my pods are getting terminated and not fulfilling the inflight requests.
So to fix this I have added a pod lifecycle preStop hook to wait for 250 seconds to complete all pending requests and set the terminationGracePeriodSeconds to 300 seconds.
lifecycle:
preStop:
exec:
command:
- /bin/sleep
- "250"
Now what I was expecting is the moment the pod transitioned to Terminating state, it should stop receiving the new requests and only fulfill the requests that it already has, but this did not happen.
Pod continued to receive the traffic until the last second and eventually got killed once the preStop completed and eventually the calling application received 502 error code.
So to debug this I moved further and doubted it could be the endpoints that are not getting updated properly in the service but I was wrong. The moment the pod transitioned to the Terminating state the endpoint is removed from the service, but the pod continued getting the traffic.
Then I log in to the node and check the IPtables to get the IP table forwarding rules to verify if the pod IP is still there, but the IPtable forwarding rules got updated immediately when the pod transition to Terminating.
>sudo iptables -t nat -L KUBE-SVC-MYVS2X43QAGQT6BT -n | column -t
Chain KUBE-SVC-MYVS2X43QAGQT6BT (1 references)
target prot opt source destination
KUBE-SEP-HDK3MJ4L3R3PLTOQ all -- <IP> <IP> /* default/test-cart-service:http */ statistic mode random probability 0.50000000000
KUBE-SEP-237VYZFMFXN2THCB all -- <IP> <IP> /* default/test-cart-service:http */
>sudo iptables -t nat -L KUBE-SEP-HDK3MJ4L3R3PLTOQ -n | column -t
Chain KUBE-SEP-HDK3MJ4L3R3PLTOQ (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- <IP> <IP> /* default/test-cart-service:http */
DNAT tcp -- <IP> <IP> /* default/test-cart-service:http */ tcp to:<IP>:<PORT>
>sudo iptables -t nat -L KUBE-SEP-HDK3MJ4L3R3PLTOQ -n | column -t
iptables: No chain/target/match by that name.
So the tables got updated immediately by the kube-proxy but still, the pod is receiving the traffic.
I tried port forwarding the service directly as well as hitting it with the network flow (ELB->Ingress->Service->pod) thinking it might be the Loadbalancer issue but the results are the same.
K8s is managed by Amazon EKS(1.20) and the Ingress service is mapped to Amazon classic Load Balancer(ELB).
I am not sure what I am missing. If this is an EKS bug or something related to k8s behavior.
Update:
Tested the same scenario on GKE and it worked as expected. Then moment the pod entered into the Terminating state, it stopped receiving the traffic and completed the pending requested only.