Connection Refused between Kubernetes pods in the same cluster

1/14/2021

I am new to Kubernetes and I'm working on deploying an application within a new Kubernetes cluster.

Currently, the service running has multiple pods that need to communicate with each other. I'm looking for a general approach to go about debugging the issue, rather than getting into the specifies of the service as the question will become much too specific.

The pods within the cluster are throwing an error: err="Get \"http://testpod.mynamespace.svc.cluster.local:8080/": dial tcp 10.10.80.100:8080: connect: connection refused" Both pods are in the same cluster.

What are the best steps to take to debug this?

I have tried running: kubectl exec -it testpod --namespace mynamespace -- cat /etc/resolv.conf And this returns: search mynamespace.svc.cluster.local svc.cluster.local cluster.local us-east-2.compute.internal Which I found here: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

-- fuzzi
kubernetes
kubernetes-pod

1 Answer

1/15/2021

First of all, the following pattern:

my-svc.my-namespace.svc.cluster-domain.example

is applicable only to FQDNs of Services, not Pods which have the following form:

pod-ip-address.my-namespace.pod.cluster-domain.example

e.g.:

172-17-0-3.default.pod.cluster.local

So in fact you're querying cluster dns about FQDN of the Service named testpod and not about FQDN of the Pod. Judging by the fact that it's being resolved successfully, such Service already exists in your cluster but most probably is misconfigured. The fact that you're getting the error message connection refused can mean the following:

  1. your Service FQDN testpod.mynamespace.svc.cluster.local has been successfully resolved (otherwise you would receive something like curl: (6) Could not resolve host: testpod.default.svc.cluster.local)
  2. you've reached successfully your testpod Service (otherwise, i.e. if it existed but wasn't listening on 8080 port, you're trying to connect to, you would receive timeout e.g. curl: (7) Failed to connect to testpod.default.svc.cluster.local port 8080: Connection timed out)
  3. you've reached the Pod, exposed by testpod Service (you've been sussessfully redirected to it by the testpod Service)
  4. but once reached the Pod, you're trying to connect to incorect port and that's why the connection is being refused by the server

My best guess is that your Pod in fact listens on different port, like 80 but you exposed it via the ClusterIP Service by specifying only --port value e.g. by:

kubectl expose pod testpod --port=8080

In such case both --port (port of the Service) and --targetPort (port of the Pod) will have the same value. In other words you've created a Service like the one below:

apiVersion: v1
kind: Service
metadata:
  name: testpod
spec:
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080

And you probably should've exposed it either this way:

kubectl expose pod testpod --port=8080 --targetPort=80

or with the following yaml manifest:

apiVersion: v1
kind: Service
metadata:
  name: testpod
spec:
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 80

Of course your targetPort may be different than 80, but connection refused in such case can mean only one thing: target http server (running in a Pod) refuses connection to 8080 port (most probably because it isn't listening on it). You didn't specify what image you're using, whether it's a standard nginx webserver or something based on your custom image. But if it's nginx and wasn't configured differently it listens on port 80.

For further debug, you can attach to your Pod:

kubectl exec -it testpod --namespace mynamespace -- /bin/sh

and if netstat command is not present (the most likely scenario) run:

apt update && apt install net-tools

and then check with netstat -ntlp on which port your container listens on.

I hope this helps you solve your issue. In case of any doubts, don't hesitate to ask.

-- mario
Source: StackOverflow