How does Traefik / Ngnix - (Ingress Controllers) forwards request to two different services having configured with same port number.?

8/29/2020

Basically I have Following Hdfs Cluster setup using docker-compose:

Node 1 with IP: 192.168.1.1 having service deployed as below:

Namenode1:9000
HMaster1: 8300
ZooKeeper1:1291

Node 2 with IP: 192.168.1.2 having service deployed as below:

Namenode2:9000
ZooKeeper2:1291

How does Traefik / Ngnix - (Ingress Controllers) forwards request to two different services having configured with same port number?

-- Jacksquad
docker-compose
kubernetes
nginx-ingress
traefik
traefik-ingress

1 Answer

8/29/2020

There are several great tutorials on how ingress and load balancing works in kubernetes, e.g. this one by Mark Betz. As a general rule, it helps to think in terms of services and workloads instead of specific nodes where your workloads are running on.

A workload deployed in Kubernetes (a so called Pod) has its own internal IP address, called a ClusterIP. That pod can have one or more ports open, just on that pod-owned ip address.

If you now have several pods to distribute the load, e.g. like 5 web server processes or backend logic, it would be hard for a client (inside the cluster) to keep track of all those pod IPs, because they also change when a pod is updated or just restarted due to a crash. This is why Kubernetes has a so called concept of services. Those provide a stable DNS name and IP which then transparently "forwards" to one of the healthy pods. So your client only needs to know the DNS name and not keep track of the specific pod IPs.

If you now want to expose such a service to the public, there are different ways. Either you set your service to type: LoadBalancer which then sets up some load balancer infrastructure on your cloud provider and routes traffic to the nodes and then to the pods - or - you already have an ingress controller in place and just define the routing based on host names and paths. An ingress controller itself is such a loadbalanced service with an attached cloud load balancer and also has some pods (with e.g. a traefik or nginx container) which then route your packets accordingly.

So coming back to your initial question: If you want to expose a service with several pods of the same kind, then you would first create a Service resource that matches your Pods using the selector and then you create one single ingress resource that provides a hostname/path and references this service. The ingress controller will pick up those ingress resources and configure the traefik or nginx accordingly. The ingress controller doesn't really care about the host IPs and port numbers, because it acts on the internal kubernetes ClusterIPs, so you even don't need (and shouldn't) expose such a service directly when you have an ingress in place.

I hope this answers your question regarding exposing two workloads over an ingress controller. For details, check the Kubernetes docs on Ingresses. Based on the services you named (zookeeper, hdfs) load balancing and ingresses might not be what you need for that case. Zookeeper instances should be internal in most cases and need to be adressed individually, so you might want to check out headless services, for this use case. Also check the Kubernetes docs for a way to run zookeeper.

-- Andreas Jägle
Source: StackOverflow