I am new to Kubernetes and I'm working on deploying an application within a new Kubernetes cluster.
Currently, the service running has multiple pods that need to communicate with each other. I'm looking for a general approach to go about debugging the issue, rather than getting into the specifies of the service as the question will become much too specific.
The pods within the cluster are throwing an error:
err="Get \"http://testpod.mynamespace.svc.cluster.local:8080/": dial tcp 10.10.80.100:8080: connect: connection refused"
Both pods are in the same cluster.
What are the best steps to take to debug this?
I have tried running:
kubectl exec -it testpod --namespace mynamespace -- cat /etc/resolv.conf
And this returns:
search mynamespace.svc.cluster.local svc.cluster.local cluster.local us-east-2.compute.internal
Which I found here: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
First of all, the following pattern:
my-svc.my-namespace.svc.cluster-domain.example
is applicable only to FQDNs of Services, not Pods which have the following form:
pod-ip-address.my-namespace.pod.cluster-domain.example
e.g.:
172-17-0-3.default.pod.cluster.local
So in fact you're querying cluster dns about FQDN of the Service
named testpod
and not about FQDN of the Pod
. Judging by the fact that it's being resolved successfully, such Service
already exists in your cluster but most probably is misconfigured. The fact that you're getting the error message connection refused
can mean the following:
Service
FQDN testpod.mynamespace.svc.cluster.local
has been successfully resolved
(otherwise you would receive something like curl: (6) Could not resolve host: testpod.default.svc.cluster.local
)testpod
Service
(otherwise, i.e. if it existed but wasn't listening on 8080
port, you're trying to connect to, you would receive timeout
e.g. curl: (7) Failed to connect to testpod.default.svc.cluster.local port 8080: Connection timed out
)Pod
, exposed by testpod
Service
(you've been sussessfully redirected to it by the testpod
Service
)Pod
, you're trying to connect to incorect port and that's why the connection is being refused by the serverMy best guess is that your Pod
in fact listens on different port, like 80
but you exposed it via the ClusterIP
Service
by specifying only --port
value e.g. by:
kubectl expose pod testpod --port=8080
In such case both --port
(port of the Service
) and --targetPort
(port of the Pod
) will have the same value. In other words you've created a Service
like the one below:
apiVersion: v1
kind: Service
metadata:
name: testpod
spec:
ports:
- protocol: TCP
port: 8080
targetPort: 8080
And you probably should've exposed it either this way:
kubectl expose pod testpod --port=8080 --targetPort=80
or with the following yaml manifest:
apiVersion: v1
kind: Service
metadata:
name: testpod
spec:
ports:
- protocol: TCP
port: 8080
targetPort: 80
Of course your targetPort
may be different than 80
, but connection refused
in such case can mean only one thing: target http server (running in a Pod
) refuses connection to 8080
port (most probably because it isn't listening on it). You didn't specify what image you're using, whether it's a standard nginx
webserver or something based on your custom image. But if it's nginx
and wasn't configured differently it listens on port 80
.
For further debug, you can attach to your Pod
:
kubectl exec -it testpod --namespace mynamespace -- /bin/sh
and if netstat
command is not present (the most likely scenario) run:
apt update && apt install net-tools
and then check with netstat -ntlp
on which port your container listens on.
I hope this helps you solve your issue. In case of any doubts, don't hesitate to ask.