DNS addon enters crashloopbackoff in Kubernetes 1.4.5

11/18/2016

As a newcomer to kubernetes, I followed the following tutorial using a "packstack" cluster. This tutorial uses kubernetes version 1.2.1 which I am informed is quite old:

http://kubernetes.io/docs/getting-started-guides/coreos/coreos_multinode_cluster/

Everything seemed ok, I was able to launch Pods, so I though I would try to install the DNS addon as a kubernetes service/rc. I googled around and saw that the DNS add on requires kubernetes version 1.3 or later.

I bumped the Kubernetes version in my master and node cloud-configs to 1.4.5 and tried again. Again, everything seems to work - except when I try to launch the DNS replication controller (and service) I see errors in the log starting with:

Expected to load root CA config from        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, but got err: open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory

I'm unsure how to proceed. I've tried to create another service account but there is no certificate in the secret. My cloud config does not reference any root certs so I wondering if that is the problem?

I've attached my master and node cloud configurations, my DNS addon yaml file and the DNS server logs.

Master cloud config:

#cloud-config

---
write-files:
  - path: /etc/conf.d/nfs
    permissions: '0644'
    content: |
      OPTS_RPC_MOUNTD=""
  - path: /opt/bin/wupiao
    permissions: '0755'
    content: |
      #!/bin/bash
      # [w]ait [u]ntil [p]ort [i]s [a]ctually [o]pen
      [ -n "$1" ] && \
        until curl -o /dev/null -sIf http://${1}; do \
          sleep 1 && echo .;
        done;
      exit $?

hostname: master
coreos:
  etcd2:
    name: master
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    advertise-client-urls: http://$private_ipv4:2379,http://$private_ipv4:4001
    initial-cluster-token: k8s_etcd
    listen-peer-urls: http://$private_ipv4:2380,http://$private_ipv4:7001
    initial-advertise-peer-urls: http://$private_ipv4:2380
    initial-cluster: master=http://$private_ipv4:2380
    initial-cluster-state: new
  fleet:
    metadata: "role=master"
  units:
    - name: etcd2.service
      command: start
    - name: generate-serviceaccount-key.service
      command: start
      content: |
        [Unit]
        Description=Generate service-account key file

        [Service]
        ExecStartPre=-/usr/bin/mkdir -p /opt/bin
        ExecStart=/bin/openssl genrsa -out /opt/bin/kube-serviceaccount.key 2048 2>/dev/null
        RemainAfterExit=yes
        Type=oneshot
    - name: setup-network-environment.service
      command: start
      content: |
        [Unit]
        Description=Setup Network Environment
        Documentation=https://github.com/kelseyhightower/setup-network-environment
        Requires=network-online.target
        After=network-online.target

        [Service]
        ExecStartPre=-/usr/bin/mkdir -p /opt/bin
        ExecStartPre=/usr/bin/curl -L -o /opt/bin/setup-network-environment -z /opt/bin/setup-network-environment https://github.com/kelseyhightower/setup-network-environment/releases/download/v1.0.0/setup-network-environment
        ExecStartPre=/usr/bin/chmod +x /opt/bin/setup-network-environment
        ExecStart=/opt/bin/setup-network-environment
        RemainAfterExit=yes
        Type=oneshot
    - name: fleet.service
      command: start
    - name: flanneld.service
      command: start
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Unit]
            Requires=etcd2.service
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.244.0.0/16", "Backend": {"Type": "vxlan"}}'
    - name: docker.service
      command: start
    - name: kube-apiserver.service
      command: start
      content: |
        [Unit]
        Description=Kubernetes API Server
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=setup-network-environment.service etcd2.service generate-serviceaccount-key.service
        After=setup-network-environment.service etcd2.service generate-serviceaccount-key.service

        [Service]
        EnvironmentFile=/etc/network-environment
        ExecStartPre=-/usr/bin/mkdir -p /opt/bin
        ExecStartPre=/usr/bin/curl -L -o /opt/bin/kube-apiserver -z /opt/bin/kube-apiserver https://storage.googleapis.com/kubernetes-release/release/v1.4.5/bin/linux/amd64/kube-apiserver
        ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-apiserver
        ExecStartPre=/opt/bin/wupiao 127.0.0.1:2379/v2/machines
        ExecStart=/opt/bin/kube-apiserver \
        --service-account-key-file=/opt/bin/kube-serviceaccount.key \
        --service-account-lookup=false \
        --admission-control=NamespaceLifecycle,NamespaceAutoProvision,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota \
        --runtime-config=api/v1 \
        --allow-privileged=true \
        --insecure-bind-address=0.0.0.0 \
        --insecure-port=8080 \
        --kubelet-https=true \
        --secure-port=6443 \
        --service-cluster-ip-range=10.244.0.0/16 \
        --etcd-servers=http://127.0.0.1:2379 \
        --public-address-override=${DEFAULT_IPV4} \
        --logtostderr=true
        Restart=always
        RestartSec=10
    - name: kube-controller-manager.service
      command: start
      content: |
        [Unit]
        Description=Kubernetes Controller Manager
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=kube-apiserver.service
        After=kube-apiserver.service

        [Service]
        ExecStartPre=/usr/bin/curl -L -o /opt/bin/kube-controller-manager -z /opt/bin/kube-controller-manager https://storage.googleapis.com/kubernetes-release/release/v1.4.5/bin/linux/amd64/kube-controller-manager
        ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-controller-manager
        ExecStart=/opt/bin/kube-controller-manager \
        --service-account-private-key-file=/opt/bin/kube-serviceaccount.key \
        --master=127.0.0.1:8080 \
        --logtostderr=true
        Restart=always
        RestartSec=10
    - name: kube-scheduler.service
      command: start
      content: |
        [Unit]
        Description=Kubernetes Scheduler
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=kube-apiserver.service
        After=kube-apiserver.service

        [Service]
        ExecStartPre=/usr/bin/curl -L -o /opt/bin/kube-scheduler -z /opt/bin/kube-scheduler https://storage.googleapis.com/kubernetes-release/release/v1.4.5/bin/linux/amd64/kube-scheduler
        ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-scheduler
        ExecStart=/opt/bin/kube-scheduler --master=127.0.0.1:8080
        Restart=always
        RestartSec=10
  update:
    group: alpha
    reboot-strategy: off

Node cloud config

#cloud-config
write-files:
  - path: /opt/bin/wupiao
    permissions: '0755'
    content: |
      #!/bin/bash
      # [w]ait [u]ntil [p]ort [i]s [a]ctually [o]pen
      [ -n "$1" ] && [ -n "$2" ] && while ! curl --output /dev/null \
        --silent --head --fail \
        http://${1}:${2}; do sleep 1 && echo -n .; done;
      exit $?
coreos:
  etcd2:
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    advertise-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    initial-cluster: master=http://10.10.1.31:2380
    proxy: on
  fleet:
    metadata: "role=node"
  units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start
    - name: flanneld.service
      command: start
    - name: docker.service
      command: start
    - name: setup-network-environment.service
      command: start
      content: |
        [Unit]
        Description=Setup Network Environment
        Documentation=https://github.com/kelseyhightower/setup-network-environment
        Requires=network-online.target
        After=network-online.target

        [Service]
        ExecStartPre=-/usr/bin/mkdir -p /opt/bin
        ExecStartPre=/usr/bin/curl -L -o /opt/bin/setup-network-environment -z /opt/bin/setup-network-environment https://github.com/kelseyhightower/setup-network-environment/releases/download/v1.0.0/setup-network-environment
        ExecStartPre=/usr/bin/chmod +x /opt/bin/setup-network-environment
        ExecStart=/opt/bin/setup-network-environment
        RemainAfterExit=yes
        Type=oneshot
    - name: kube-proxy.service
      command: start
      content: |
        [Unit]
        Description=Kubernetes Proxy
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=setup-network-environment.service
        After=setup-network-environment.service

        [Service]
        ExecStartPre=/usr/bin/curl -L -o /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.4.5/bin/linux/amd64/kube-proxy
        ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
        # wait for kubernetes master to be up and ready
        ExecStartPre=/opt/bin/wupiao 10.10.1.31 8080
        ExecStart=/opt/bin/kube-proxy \
        --master=10.10.1.31:8080 \
        --logtostderr=true
        Restart=always
        RestartSec=10
    - name: kube-kubelet.service
      command: start
      content: |
        [Unit]
        Description=Kubernetes Kubelet
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=setup-network-environment.service
        After=setup-network-environment.service

        [Service]
        EnvironmentFile=/etc/network-environment
        ExecStartPre=/usr/bin/curl -L -o /opt/bin/kubelet -z /opt/bin/kubelet https://storage.googleapis.com/kubernetes-release/release/v1.4.5/bin/linux/amd64/kubelet
        ExecStartPre=/usr/bin/chmod +x /opt/bin/kubelet
        # wait for kubernetes master to be up and ready
        ExecStartPre=/opt/bin/wupiao 10.10.1.31 8080
        ExecStart=/opt/bin/kubelet \
        --address=0.0.0.0 \
        --port=10250 \
        --hostname-override=${DEFAULT_IPV4} \
        --api-servers=10.10.1.31:8080 \
        --allow-privileged=true \
        --logtostderr=true \
        --cadvisor-port=4194 \
        --healthz-bind-address=0.0.0.0 \
        --healthz-port=10248
        Restart=always
        RestartSec=10
  update:
    group: alpha
    reboot-strategy: off

DNS Addon yaml

apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "KubeDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.244.0.5
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP


---


apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-dns-v20
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    version: v20
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 1
  selector:
    k8s-app: kube-dns
    version: v20
  template:
    metadata:
      labels:
        k8s-app: kube-dns
        version: v20
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
        scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
    spec:
      containers:
      - name: kubedns
        image: gcr.io/google_containers/kubedns-amd64:1.8
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        livenessProbe:
          httpGet:
            path: /healthz-kubedns
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 3
          timeoutSeconds: 5
        args:
        - --domain=cluster.local.
        - --dns-port=10053
        ports:
        - containerPort: 10053
          name: dns-local
          protocol: UDP
        - containerPort: 10053
          name: dns-tcp-local
          protocol: TCP
      - name: dnsmasq
        image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4
        livenessProbe:
          httpGet:
            path: /healthz-dnsmasq
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        args:
        - --cache-size=1000
        - --no-resolv
        - --server=127.0.0.1#10053
        - --log-facility=-
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
      - name: healthz
        image: gcr.io/google_containers/exechealthz-amd64:1.2
        resources:
          limits:
            memory: 50Mi
          requests:
            cpu: 10m
            memory: 50Mi
        args:
        - --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
        - --url=/healthz-dnsmasq
        - --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
        - --url=/healthz-kubedns
        - --port=8080
        - --quiet
        ports:
        - containerPort: 8080
          protocol: TCP
      dnsPolicy: Default

DNS addon log

E1118 17:33:10.140677       1 config.go:265] Expected to load root CA config from /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, but got err: open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory
I1118 17:33:10.141079       1 server.go:94] Using https://10.244.0.1:443 for kubernetes master, kubernetes API: <nil>
I1118 17:33:10.141596       1 server.go:99] v1.5.0-alpha.0.1651+7dcae5edd84f06-dirty
I1118 17:33:10.141728       1 server.go:101] FLAG: --alsologtostderr="false"
I1118 17:33:10.141840       1 server.go:101] FLAG: --dns-port="10053"
I1118 17:33:10.141931       1 server.go:101] FLAG: --domain="cluster.local."
I1118 17:33:10.142073       1 server.go:101] FLAG: --federations=""
I1118 17:33:10.142171       1 server.go:101] FLAG: --healthz-port="8081"
I1118 17:33:10.142260       1 server.go:101] FLAG: --kube-master-url=""
I1118 17:33:10.142345       1 server.go:101] FLAG: --kubecfg-file=""
I1118 17:33:10.142433       1 server.go:101] FLAG: --log-backtrace-at=":0"
I1118 17:33:10.142522       1 server.go:101] FLAG: --log-dir=""
I1118 17:33:10.142605       1 server.go:101] FLAG: --log-flush-frequency="5s"
I1118 17:33:10.142688       1 server.go:101] FLAG: --logtostderr="true"
I1118 17:33:10.142771       1 server.go:101] FLAG: --stderrthreshold="2"
I1118 17:33:10.142853       1 server.go:101] FLAG: --v="0"
I1118 17:33:10.142932       1 server.go:101] FLAG: --version="false"
I1118 17:33:10.143056       1 server.go:101] FLAG: --vmodule=""
I1118 17:33:10.143247       1 server.go:138] Starting SkyDNS server. Listening on port:10053
I1118 17:33:10.143455       1 server.go:145] skydns: metrics enabled on : /metrics:
I1118 17:33:10.143556       1 dns.go:166] Waiting for service: default/kubernetes
I1118 17:33:10.144214       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I1118 17:33:10.144358       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I1118 17:33:10.154429       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.244.0.1:443/api/v1/namespaces/default/services/kubernetes: x509: failed to load system roots and no roots provided. Sleeping 1s before retrying.
E1118 17:33:10.159852       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.244.0.1:443/api/v1/services?resourceVersion=0: x509: failed to load system roots and no roots provided
E1118 17:33:10.171051       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.244.0.1:443/api/v1/endpoints?resourceVersion=0: x509: failed to load system roots and no roots provided
I1118 17:33:11.157527       1 dns.go:172] Ignoring error while waiting for service default/kubernetes: Get https://10.244.0.1:443/api/v1/namespaces/default/services/kubernetes: x509: failed to load system roots and no roots provided. Sleeping 1s before retrying.
E1118 17:33:11.166217       1 reflector.go:214] pkg/dns/dns.go:155: Failed to list *api.Service: Get https://10.244.0.1:443/api/v1/services?resourceVersion=0: x509: failed to load system roots and no roots provided
E1118 17:33:11.181840       1 reflector.go:214] pkg/dns/dns.go:154: Failed to list *api.Endpoints: Get https://10.244.0.1:443/api/v1/endpoints?resourceVersion=0: x509: failed to load system roots and no roots provided
-- Gapmeister66
certificate
coreos
dns
kubernetes

2 Answers

11/23/2016

While you can, as you proposed in your answer, talk to the insecure port, thereby bypassing any authentication and working around the issue with kubeDNS, this will not fix the secrets for anything else using Service Accounts in your cluster.

The reason that there is no CA included in the secret is that you haven't told the Controller Manager to include one. You can provide the path for the Root CA with the flag --root-ca-file.

From the kube-controller-manager documentation

If set, this root certificate authority will be included in service account's token secret. This must be a valid PEM-encoded CA bundle.`

I would highly recommend taking a look at the current version of the CoreOS Kubernetes Step-by-Step documentation, which is up-to-date with Kubernetes v1.4.3. There have been several changes, and it includes documentation on how to appropriately generate and use self-signed certs to secure your cluster and provide valid Service Account tokens

-- chaosaffe
Source: StackOverflow

11/22/2016

Out of courtesy I am posting my solution, which might also help somebody in the same situation. I am using cloud-init to start up K8S services and running DNS in a pod. I realised that pod is running in a different network, so modifying my dns-addon.yaml to pass an extra arg to the kubedns container with the correct network address: "- --kube-master-url=http://10.10.1.31:8080" did the trick. The error message was slightly misleading!

-- Gapmeister66
Source: StackOverflow