Is there a way to configure Istio to route traffic to a POD which is in the terminating state?

10/9/2018

I have a Kubernetes cluster with two services deployed: SvcA and SvcB - both in the service mesh.

SvcA is backed by a single Pod, SvcA_P1. The application in SvcA_P1 exposes a PreStop HTTP hook. When performing a "kubectl drain" command on the node where SvcA_P1 resides, the Pod transitions into the "terminating" state and remains in that state until the application has completed its work (the rest request returns and Kubernetes removes the pod). The work for SvcA_P1 includes completing ongoing in-dialog (belonging to established sessions) HTTP requests/responses. It can stay in the "terminating" state for hours before completing.

When the Pod enters the "terminating" phase, Istio sidecar appears to remove the SvcA_P1 from the pool. Requests sent to SvcA_P1 from e.g., SvcB_P1 are rejected with a "no healthy upstream".

Is there a way to configure Istio/Envoy to:

  1. Continue to send traffic/sessions with affinity to SvcA_P1 while in "terminating" state?
  2. Reject traffic without session affinity to SvcA_P1 (no JSESSIONID, cookies, or special HTTP headers)?

I have played around with the DestinationRule(s), modifying trafficPolicy.loadBalancer.consistentHash.[httpHeaderName|httpCookie] with no luck. Once the Envoy removes the upstream server, the new destination is re-hashed using the reduced set of servers.

Thanks,

Thor

-- T. Sandgren
istio
kubernetes

1 Answer

10/10/2018

According to Kubernetes documentation, when pod must be deleted three things happen simultaneously:

  • Pod shows up as “Terminating” when listed in client commands
  • When the Kubelet sees that a Pod has been marked as terminating because the "dead" timer for the Pod has been set in the API server, it begins the pod shutdown process.
    • If the pod has defined a preStop hook, it is invoked inside of the pod. If the preStop hook is still running after the grace period expires, step 2 is then invoked with a small (2 second) extended grace period.
  • Pod is removed from endpoints list for service, and are no longer considered part of the set of running pods for replication controllers. Pods that shutdown slowly cannot continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.

As soon as Istio works like a mesh network below/behind Kubernetes Services and Services no longer consider a Pod in Terminating state as a destination for the traffic, tweaking Istio policies doesn't help much.

-- VAS
Source: StackOverflow