Kubernetes not resolving node service

2/6/2018

I'm having issues with the internal DNS/service resolution within Kubernetes and I can't seem to track the issue down. I have an api-gateway pod running Kong, which calls other services by their internal service name, i.e srv-name.staging.svc.cluster.local. Which was working fine up until recently. I attempted to deploy 3 more services, into two namespaces, staging and production.

The first service, works as expected when calling booking-service.staging.svc.cluster.local, however the same code doesn't seem to work in the production service. And the other two service don't worth in either namespace.

The behavior I'm getting is a timeout. If I curl these services from my gateway pod, they all timeout, apart from the first service deployed (booking-service.staging.svc.cluster.local). When I call these services from another container within the same pod, they do work as expected.

I have Node services set up for each service I wish to expose to the client side.

Here's an example Kubernetes deployment:

---

# API

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: {{SRV_NAME}}
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: {{SRV_NAME}}
    spec:
        containers:
        - name: booking-api
          image: microhq/micro:kubernetes
          args:
            - "api"
            - "--handler=rpc"
          env:
          - name: PORT
            value: "8080"
          - name: ENV
            value: {{ENV}}
          - name: MICRO_REGISTRY
            value: "kubernetes"
          ports:
          - containerPort: 8080
        - name: {{SRV_NAME}}
          image: eu.gcr.io/{{PROJECT_NAME}}/{{SRV_NAME}}:latest
          imagePullPolicy: Always
          command: [
            "./service",
            "--selector=static"
          ]
          env:
          - name: MICRO_REGISTRY
            value: "kubernetes"
          - name: ENV
            value: {{ENV}}
          - name: DB_HOST
            value: {{DB_HOST}}
          - name: VERSION
            value: "{{VERSION}}"
          - name: MICRO_SERVER_ADDRESS
            value: ":50051"
          ports:
          - containerPort: 50051
            name: srv-port
---

apiVersion: v1
kind: Service
metadata:
  name: booking-service
spec:
  ports:
  - name: api-http
    port: 80
    targetPort: 8080
    protocol: TCP
  selector:
    app: booking-api

I'm using go-micro https://github.com/micro/go-micro with the Kubernetes pre-configuration. Which again works in one case absolutely fine, but not all the others. Which leads me to believe it's not code related. It also works fine locally.

When I do nslookup from another pod, it resolves the name and finds the cluster IP for the internal Node service as expected. When I attempt to cURL that IP address, I get the same timeout behavior.

I'm using Kubernetes 1.8 on Google Cloud.

-- Ewan Valentine
dns
google-cloud-platform
google-kubernetes-engine
kubernetes
microservices

1 Answer

2/7/2018

I don't understand why you think that it is an issue with the internal DNS/service resolution within Kubernetes since when you perform the DNS lookup it works, but if you query that IP you get a connection timeout.

  • If you curl these services from outside the pod they all timeout, apart from the first service deployed, no matter if you used the IP or the domain name.
  • When you call these services from another container within the same pod, they do work as expected.

It seems an issue with the connection between pods more than a DNS issue therefore I would focus your troubleshooting towards that direction, but correct me if I'am wrong.

Can you perform the classical networking troubleshooting (ping, telnet, traceroute)from a pod toward the IP given by the DNS lookup and from one of the container that is giving timeout to one of the other pods and update the question with the results?

-- GalloCedrone
Source: StackOverflow