Kubernetes NodePort Service randomly has Network unreachable or hangs

8/27/2021

Here is my configuration:

Hosts/Nodes IP addresses: Primary=10.0.0.12, Worker1=10.0.0.16, Worker2=10.0.0.20

serviceAndDeployment.yaml file:

apiVersion: v1
kind: Service
metadata:
  name: my-webserver-service
  labels:
    app: nginx
spec:
  type: NodePort
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 80
    nodePort: 30080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: nginx
      containers:
      - name: nginx
        image: nginx:alpine
        ports:
        - containerPort: 80

No problem with applying/deploying it:

kubectl apply -f serviceAndDeployment.yaml
kubectl get pods

Everything looks good so far; there is a pod on each node.

Now, I log onto one of the nodes (10.0.0.20):

ss -tln

LISTEN     0      128                        *:30080                 *:*
                       

...just the way we like it.

curl localhost:30080

Perfect, we get "Welcome to nginx!"

BUT now do the same thing again:

curl localhost:30080

And it's "curl: (7) Failed connect to localhost:30080; Network is unreachable"

Doing the same thing in rapid succession randomly (yes, randomly) gives one of three things: the response we want, Network is unreachable, or completely hangs until ctrl-c.

Doing the same thing from any other host on the network, to either ...

curl 10.0.0.20:30080  OR  curl 10.0.0.16  FROM  10.0.0.XXX  (any host whether the primary or other)

gives the exact same result.

sudo tcpdump -nm --number -i ens160 | grep 10.0.0.14    (done on 10.0.0.20)

On the hang, I see this:

09:47:04.844525 IP 10.0.0.14.52500 > 10.0.0.20.30080: Flags [S], seq 77004116, win 29200, options [mss 1460,sackOK,TS val 664329183 ecr 0,nop,wscale 7], length 0

On "Network unreachable", I see this:

09:56:00.608617 IP 10.0.0.14.52504 > 10.0.0.20.30080: Flags [S], seq 2968934283, win 29200, options [mss 1460,sackOK,TS val 664864943 ecr 0,nop,wscale 7], length 0
09:56:00.611607 IP 10.0.0.20 > 10.0.0.14: ICMP net 10.0.0.20 unreachable, length 36

But I can ping 10.0.0.14 from 10.0.0.20 just fine.

I don't see any other traffic between those two without making a request, except the occasional ARP request.

I have disabled firewalld with:

sudo systemctl disable --now firewalld

but the resulting behavior is not changed.

Changing the externalTrafficPolicy to Local does "work", in that I get the response I want every time, but now there is no load balancing. I already have HAProxy running as a reverse proxy & load balancer on another host, but, for some reason, it doesn't work with this NodePort as the backend. That is, I can curl 10.0.0.20:30080 from the host that is running HAProxy successfully, but HAProxy doesn't see it as a valid backend.

Why does the Service have port 30080 open and listening on my control node (which does not have an app pod running on it), if that primary node doesn't then act as a load balancer?

Why doesn't HAProxy work with the NodePort as the backend?

What is going on here?

Thank you.

-- Adam Winter
kubernetes
load-balancing
nginx

0 Answers