Bare-Metal K8s: How to preserve source IP of client and direct traffic to nginx replica on current server on

7/23/2019

I would like to ask you about some assistance:

Entrypoint to cluster for http/https is NGINX: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.0 running as deamonset

I want to achieve 2 things:

  1. preserve source IP of client
  2. direct traffic to nginx replica on current server (so if request is sent to server A, listed as externalIP address, nginx on node A should handle it)

Questions:

  • How is it possible?
  • Is it possible without nodeport? Control plane can be started with custom --service-node-port-range so I can add nodeport for 80 and 443, but it looks a little bit like a hack (after reading about nodeport intended usage)

I was considering using metallb, but layer2 configuration will cause bottleneck (high traffic on cluster). I am not sure if BGP will solve this problem.

  • Kubernetes v15
  • Bare-metal
  • Ubuntu 18.04
  • Docker (18.9) and WeaveNet (2.6)
-- Hoggie
kubernetes
kubernetes-ingress
kubernetes-service

1 Answer

8/27/2019

You can preserve the source IP of client by using externalTrafficPolicy set to local, this will proxy requests to local endpoints. This is explained on Source IP for Services with Type=NodePort.

Can should also have a look at Using Source IP.

In case of MetalLB:

MetalLB respects the service’s externalTrafficPolicy option, and implements two different announcement modes depending on what policy you select. If you’re familiar with Google Cloud’s Kubernetes load balancers, you can probably skip this section: MetalLB’s behaviors and tradeoffs are identical.

“Local” traffic policy

With the Local traffic policy, nodes will only attract traffic if they are running one or more of the service’s pods locally. The BGP routers will load-balance incoming traffic only across those nodes that are currently hosting the service. On each node, the traffic is forwarded only to local pods by kube-proxy, there is no “horizontal” traffic flow between nodes.

This policy provides the most efficient flow of traffic to your service. Furthermore, because kube-proxy doesn’t need to send traffic between cluster nodes, your pods can see the real source IP address of incoming connections.

The downside of this policy is that it treats each cluster node as one “unit” of load-balancing, regardless of how many of the service’s pods are running on that node. This may result in traffic imbalances to your pods.

For example, if your service has 2 pods running on node A and one pod running on node B, the Local traffic policy will send 50% of the service’s traffic to each node. Node A will split the traffic it receives evenly between its two pods, so the final per-pod load distribution is 25% for each of node A’s pods, and 50% for node B’s pod. In contrast, if you used the Cluster traffic policy, each pod would receive 33% of the overall traffic.

In general, when using the Local traffic policy, it’s recommended to finely control the mapping of your pods to nodes, for example using node anti-affinity, so that an even traffic split across nodes translates to an even traffic split across pods.

You need to take for the account the limitations of BGP routing protocol for MetalLB.

Please also have a look at this blog post Using MetalLb with Kind.

-- Crou
Source: StackOverflow