kube-dns can not resolve 'kubernetes.default.svc.cluster.local'

3/1/2017

After deploying the kubernetes cluster using kargo, I found out that kubedns pod is not working properly:

$ kcsys get pods -o wide

NAME          READY STATUS           RESTARTS AGE  IP           NODE
dnsmasq-alv8k 1/1   Running          2        1d   10.233.86.2  kubemaster
dnsmasq-c9y52 1/1   Running          2        1d   10.233.82.2  kubeminion1
dnsmasq-sjouh 1/1   Running          2        1d   10.233.76.6  kubeminion2
kubedns-hxaj7 2/3   CrashLoopBackOff 339      22h  10.233.76.3  kubeminion2

PS : kcsys is an alias of kubectl --namespace=kube-system

Logs for each container (kubedns, dnsmasq) seems OK except healthz container as following:

2017/03/01 07:24:32 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local' error exit status 1

Update

kubedns rc description

apiVersion: v1
kind: ReplicationController
metadata:
  creationTimestamp: 2017-02-28T08:31:57Z
  generation: 1
  labels:
    k8s-app: kubedns
    kubernetes.io/cluster-service: "true"
    version: v19
  name: kubedns
  namespace: kube-system
  resourceVersion: "130982"
  selfLink: /api/v1/namespaces/kube-system/replicationcontrollers/kubedns
  uid: 5dc9f9f2-fd90-11e6-850d-005056a020b4
spec:
  replicas: 1
  selector:
    k8s-app: kubedns
    version: v19
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: kubedns
        kubernetes.io/cluster-service: "true"
        version: v19
    spec:
      containers:
      - args:
        - --domain=cluster.local.
        - --dns-port=10053
        - --v=2
        image: gcr.io/google_containers/kubedns-amd64:1.9
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: kubedns
        ports:
        - containerPort: 10053
          name: dns-local
          protocol: UDP
        - containerPort: 10053
          name: dns-tcp-local
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP          
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 100m
            memory: 170Mi
          requests:
            cpu: 70m
            memory: 70Mi
        terminationMessagePath: /dev/termination-log
      - args:
        - --log-facility=-
        - --cache-size=1000
        - --no-resolv
        - --server=127.0.0.1#10053
        image: gcr.io/google_containers/kube-dnsmasq-amd64:1.3
        imagePullPolicy: IfNotPresent
        name: dnsmasq
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 170Mi
          requests:
            cpu: 70m
            memory: 70Mi
        terminationMessagePath: /dev/termination-log
      - args:
        - -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
          && nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
        - -port=8080
        - -quiet
        image: gcr.io/google_containers/exechealthz-amd64:1.1
        imagePullPolicy: IfNotPresent
        name: healthz
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            cpu: 10m
            memory: 50Mi
          requests:
            cpu: 10m
            memory: 50Mi
        terminationMessagePath: /dev/termination-log
      dnsPolicy: Default
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  fullyLabeledReplicas: 1
  observedGeneration: 1
  replicas: 1`

kubedns svc description:

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2017-02-28T08:31:58Z
  labels:
    k8s-app: kubedns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: kubedns
  name: kubedns
  namespace: kube-system
  resourceVersion: "10736"
  selfLink: /api/v1/namespaces/kube-system/services/kubedns
  uid: 5ed4dd78-fd90-11e6-850d-005056a020b4
spec:
  clusterIP: 10.233.0.3
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  selector:
    k8s-app: kubedns
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

I catch some errors in kubedns container:

1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout
1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://10.233.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout

UPDATE 2

  1. iptables rules created by kube-proxy when creating hostnames service with 3 pods:

enter image description here

  1. flags of controller-manager pod: enter image description here

  2. pods status

enter image description here

-- mootez
docker
kube-dns
kubernetes

2 Answers

3/5/2017

Can you take a look at the output from ps auxf | grep dockerd.

Kargo is adding the setting iptables=false to the docker daemon. As far as I can see, this is causing issues with container to host networking as connecting to 10.233.0.1:443 will follow iptable rules that forward the request to one of the master nodes' api server.

The other kubernetes services have their networking bound to the host so you will not experience the issue.

I'm not sure if this is the root issue, however removing iptables=false from the docker daemon settings has fixed any issues we were experiencing. This is not disabled by default and is not expected to be disabled for using network overlays like flannel.

Removing the iptables option for the docker daemon can be done from /etc/systemd/system/docker.service.d/docker-options.conf which should look something like this:

[root@k8s-joy-g2eqd2 ~]# cat /etc/systemd/system/docker.service.d/docker-options.conf [Service] Environment="DOCKER_OPTS=--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker --iptables=false"

Once this is updated you can run systemctl daemon-reload to register the change and then systemctl restart docker.

This will allow you to test if this fixes your issue. Once you can confirm this is the fix, you can override the docker_options variable in the kargo deployment to exclude that rule:

docker_options: "--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker"

-- the0rem
Source: StackOverflow

3/2/2017

According to the error you posted, kubedns can not communicate with the API server:

dial tcp 10.233.0.1:443: i/o timeout

This can mean three things:


Your network fabric for containers is not configured properly

  • Look for errors in the logs of the network solution you're using
  • Make sure every Docker deamon is using its own IP range
  • Verify that the container network does not overlap with the host network

You have a problem with your kube-proxy and the network traffic is not forwarded to the API server when using the kubernetes internal Service (10.233.0.1)

  • Check the kube-proxy logs on your nodes (kubeminion{1,2}) and update your question with any error you may find

If you are also seeing authentication errors:

kube-controller-manager does not produce valid Service Account tokens

  • Check that the --service-account-private-key-file and --root-ca-file flags of kube-controller-manager are set to a valid key/cert and restart the service

  • Delete the default-token-xxxx secret in the kube-system namespace and recreate the kube-dns Deployment

-- Antoine Cotten
Source: StackOverflow