k8s Ingress affects LoadBalancer on different domain

11/17/2017

TL;DR: A nginx-ingress-controller affects another LoadBalancer service on a different domain once every ~5 requests.

I have a weird situation with Kubernetes on GCE, and I am stuck. I don't know if I have a configuration or if a have stumbled upon a (very severe) bug in k8s.

I have two LoadBalancer services, each with their own static IP and a DNS record pointing to them.

One LoadBalancer points (through it's selector) directly to a Deployment with my API webserver running on it, this is api.domain.com. This API cannot be behind an ingress controller due to a complex client side certificate authentication scheme, which is not (yet) possible with the nginx ingress.

The other LoadBalancer service points to a NGINX ingress controller. Which serves my website at site.domain.com. I use a standard nginx-default-backend to serve the 404 from the ingress controller.

The issue is that when I load the API (at api.domain.com) in a browser, once every 3 or 4 times I hit refresh the 404 is served from nginx-default-backend.

So once every 5 times or so, a page from a totally different domain (site.domain.com, 234.234.234.234) is served on my API domain (api.domain.com, 123.123.123.123). I don't understand how this can happen.

Once I remove the nginx-ingress-controller, the API functions normally again. I'm really puzzled.

For the API:

apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  type: LoadBalancer
  loadBalancerIP: 123.123.123.123
  selector:
    app: api
  ports:
  - port: 443

And for the website:

apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress-lb
  labels:
    app: nginx-ingress-lb
spec:
  type: LoadBalancer
  loadBalancerIP: 234.234.234.234
  ports:
  - port: 443
    name: https
  selector:
    # Selects nginx-ingress-controller pods
    app: nginx-ingress-controller
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  labels:
    app: nginx-ingress-controller
spec:
  replicas: 1
  template:
    metadata:
      name: nginx-ingress-controller
      labels:
        app: nginx-ingress-controller
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
        name: nginx-ingress-controller
        readinessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          timeoutSeconds: 1
        ports:
        - containerPort: 443
          hostPort: 443
        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        args:
        - /nginx-ingress-controller
        - --default-backend-service=$(POD_NAMESPACE)/nginx-default-backend
        - --publish-service=$(POD_NAMESPACE)/nginx-ingress-lb
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress
  namespace: development
spec:
  tls:
  - hosts:
    - site.domain.com
    secretName: "site.domain.com-tls"
  rules:
  - host: "site.domain.com"
    http:
      paths:
      - backend:
          serviceName: website
          servicePort: http

What I have checked so far:

I have checked my DNS records using host -a, they are both correct. I checked for name collisions in the selectors using kubectl get po -l app=website, no collisions. I have checked the bound IP addresses:

> kubectl get svc
NAME                    TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)
api                     LoadBalancer   10.3.240.197   123.123.123.123  443:32126/TCP
nginx-default-backend   ClusterIP      10.3.253.16    <none>           80/TCP
nginx-ingress-lb        LoadBalancer   10.3.245.191   234.234.234.234  443:31051/TCP
website                 ClusterIP      10.3.254.180   <none>           80/TCP

> kubectl get ingress
NAME          HOSTS             ADDRESS           PORTS
ingress       site.domain.com   234.234.234.234   80, 443

> host api.domain.com
api.domain.com has address 123.123.123.123
> host site.domain.com
site.domain.com has address 234.234.234.234

All looks good to me.

Am I doing something wrong or is there something seriously wrong with k8s or nginx-ingress?

-- Léon Melis
kubernetes

1 Answer

11/19/2017

This was an interesting one.
I spent some time drawing diagrams and hypothesising why the error was occuring but the best answer comes from the GLBC README:

Don't start 2 instances of the controller in a single cluster, they will fight each other

Edit

I believe this behaviour is due to how the GCE loadbalancer forwarding rules work conflicting with the nginx-ingress-controller (or vice versa :) )

From what I can tell the GCE loadbalancer forwarding rules accept traffic on the same port number that is forwarded to cluster hosts i.e. :443 in your example.

In the nginx-ingress-controller definition:

ports:
        - containerPort: 443
          hostPort: 443

We see that the nginx-ingress pods are listening on hosts at :443.
but the GCE load balancer is also forwarding to hosts at :443.

Putting it all together

Imagine your API pods are deployed on some subset of cluster nodes say 3/4.
Then 3/4 times the GCE load balancer directs traffic to a host with a listening API pod - success!

But the 4th request routes to a node on port443 with no API pod running. However an nginx-ingress-controller pod is listening and so responds to the request with 404.

So the issue is not really one of DNS resolution as it might appear.


References

The below quote from the k8s services shortcomings seems to support my theory, as the NodePort values are unused, hence port forwarding is happening on the same port.

This is not strictly required on all cloud providers (e.g. Google Compute Engine does not need to allocate a NodePort to make LoadBalancer work, but AWS does)

GCE forwarding rule creation
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce_loadbalancer_external.go

-- stacksonstacks
Source: StackOverflow