Accessing service from an Alpine-based k8s pod is throwing a DNS Resolution error

9/22/2021

I have pod A (it's actually the kube-scheduler pod) and pod B (a pod that has a REST API that will be invoked by pod A).

For this purpose, I created a ClusterIP service.

Now, when I exec into pod A to perform the API call to pod B, I get: curl: (6) Could not resolve host: my-svc.default.svc.cluster.local

I tried to follow the debug instructions mentioned here:

kubectl exec -i -t dnsutils -- nslookup my-svc.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   my-svc.default.svc.cluster.local
Address: 10.111.181.13

Also:

kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

This seems to be working as expected. However, when I exec into pod A, I get:

kubectl exec -it kube-scheduler -n kube-system -- sh
/bin # nslookup kubernetes.default
Server:         8.8.8.8
Address:        8.8.8.8:53

** server can't find kubernetes.default: NXDOMAIN

** server can't find kubernetes.default: NXDOMAIN

Other debugging steps (inside pod A) include:

/bin # cat /etc/resolv.conf
nameserver 8.8.8.8
nameserver 172.30.0.1

And:

/bin # cat /etc/*-release
3.12.8
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.12.8
PRETTY_NAME="Alpine Linux v3.12"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"

There are no useful logs from the coredns pods, either.

kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d

From the documentation, it seems there is a known issue with Alpine and DNS resolution (even though the version I have is greater than the version they mentioned).

Is there a workaround this to enable accessing the service properly from the Alpine pod?


Edit providing pod A manifest:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-scheduler
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    - --config=/etc/kubernetes/sched-cs.yaml
    - --port=0
    image: localhost:5000/scheduler-plugins/kube-scheduler:latest
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: kube-scheduler
    resources:
      requests:
        cpu: 100m
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/kubernetes/scheduler.conf
      name: kubeconfig
      readOnly: true
    - mountPath: /etc/kubernetes/sched-cs.yaml
      name: sched-cs
      readOnly: true
  hostNetwork: true
  priorityClassName: system-node-critical
  volumes:
  - hostPath:
      path: /etc/kubernetes/scheduler.conf
      type: FileOrCreate
    name: kubeconfig
  - hostPath:
      path: /etc/kubernetes/sched-cs.yaml
      type: FileOrCreate
    name: sched-cs
status: {}

Edit 2: Adding the following lines manually to /etc/resolv.conf of Pod A allows me to perform the curl request successfully.

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Wouldn't there be a cleaner/less manual way to achieve the same result?

-- sqlquestionasker
alpine
coredns
docker
kubernetes

2 Answers

9/23/2021

Error curl: (6) Could not resolve host mainly occurs due to a wrong DNS set up or bad settings on the server. You can find an explanation of this problem.

If you want to apply a custom DNS configuration you can do so according to this documentation:

If a Pod's dnsPolicy is set to default, it inherits the name resolution configuration from the node that the Pod runs on. The Pod's DNS resolution should behave the same as the node. But see Known issues.

If you don't want this, or if you want a different DNS config for pods, you can use the kubelet's --resolv-conf flag. Set this flag to "" to prevent Pods from inheriting DNS. Set it to a valid file path to specify a file other than /etc/resolv.conf for DNS inheritance.

Another solution will be to create your own system image in which you already put the values you are interested in.

-- Mikołaj Głodziak
Source: StackOverflow

9/23/2021

Try setting the DNSPolicy for pod A (or whatever deployment, statefulset, etc.) defines its template to ClusterFirst or ClusterFirstWithHostNet.

The behavior of this setting depends on how your cluster and kubelet are set up, but in most default configurations this will make the kubelet set resolv.conf inside the pod to use the kube-dns service that you manually set in your edit (10.96.0.10), which will forward lookups outside the cluster to the nameservers for the host.

K8s docs

-- switchboard.op
Source: StackOverflow