K
Q

Question

Dask Kubernetes worker pod giving error status

11/20/2019

I am following link: https://kubernetes.dask.org/en/latest/, to run dask array on Kubernetes cluster. While running the example code, the worker pod is showing error status as below:

Steps:

Installed Kubernetes on 3 nodes(1 Master and 2 workers).
pip install dask-kubernetes
dask_example.py with code to run dask array (same as example given on link)
Worker-spec.yml file with pod configuration (same as example given on link)

(base) [root@k8s-master example]# ls
dask_example.py  worker-spec.yml
(base) [root@k8s-master example]# nohup python dask_example.py &
[1] 3660
(base) [root@k8s-master example]# cat nohup.out
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:   tcp://172.16.0.76:40119
distributed.scheduler - INFO - Receive client connection: Client-df4caa18-0bc8-11ea-8e4c-12bd5ffa93ff
distributed.core - INFO - Starting established connection
(base) [root@k8s-master example]# kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE     IP             NODE           NOMINATED NODE   READINESS GATES
default       workerpod                            1/1     Running   0          70s     10.32.0.2      worker-node1   <none>           <none>
kube-system   coredns-5644d7b6d9-l4jsd             1/1     Running   0          8m19s   10.32.0.4      k8s-master     <none>           <none>
kube-system   coredns-5644d7b6d9-q679h             1/1     Running   0          8m19s   10.32.0.3      k8s-master     <none>           <none>
kube-system   etcd-k8s-master                      1/1     Running   0          7m16s   172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-apiserver-k8s-master            1/1     Running   0          7m1s    172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-controller-manager-k8s-master   1/1     Running   0          7m27s   172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-proxy-ctgj8                     1/1     Running   0          5m7s    172.16.0.114   worker-node2   <none>           <none>
kube-system   kube-proxy-f78bm                     1/1     Running   0          8m18s   172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-proxy-ksk59                     1/1     Running   0          5m15s   172.16.0.31    worker-node1   <none>           <none>
kube-system   kube-scheduler-k8s-master            1/1     Running   0          7m2s    172.16.0.76    k8s-master     <none>           <none>
kube-system   weave-net-q2zwn                      2/2     Running   0          6m22s   172.16.0.76    k8s-master     <none>           <none>
kube-system   weave-net-r9tzs                      2/2     Running   0          5m15s   172.16.0.31    worker-node1   <none>           <none>
kube-system   weave-net-tm8xx                      2/2     Running   0          5m7s    172.16.0.114   worker-node2   <none>           <none>
(base) [root@k8s-master example]# kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE     IP             NODE           NOMINATED NODE   READINESS GATES
default       workerpod                            0/1     Error     0          4m23s   10.32.0.2      worker-node1   <none>           <none>
kube-system   coredns-5644d7b6d9-l4jsd             1/1     Running   0          11m     10.32.0.4      k8s-master     <none>           <none>
kube-system   coredns-5644d7b6d9-q679h             1/1     Running   0          11m     10.32.0.3      k8s-master     <none>           <none>
kube-system   etcd-k8s-master                      1/1     Running   0          10m     172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-apiserver-k8s-master            1/1     Running   0          10m     172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-controller-manager-k8s-master   1/1     Running   0          10m     172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-proxy-ctgj8                     1/1     Running   0          8m20s   172.16.0.114   worker-node2   <none>           <none>
kube-system   kube-proxy-f78bm                     1/1     Running   0          11m     172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-proxy-ksk59                     1/1     Running   0          8m28s   172.16.0.31    worker-node1   <none>           <none>
kube-system   kube-scheduler-k8s-master            1/1     Running   0          10m     172.16.0.76    k8s-master     <none>           <none>
kube-system   weave-net-q2zwn                      2/2     Running   0          9m35s   172.16.0.76    k8s-master     <none>           <none>
kube-system   weave-net-r9tzs                      2/2     Running   0          8m28s   172.16.0.31    worker-node1   <none>           <none>
kube-system   weave-net-tm8xx                      2/2     Running   0          8m20s   172.16.0.114   worker-node2   <none>           <none>
(base) [root@k8s-master example]# cat nohup.out
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:   tcp://172.16.0.76:40119
distributed.scheduler - INFO - Receive client connection: Client-df4caa18-0bc8-11ea-8e4c-12bd5ffa93ff
distributed.core - INFO - Starting established connection
 (base) [root@k8s-master example]# kubectl describe pod workerpod
Name:         workerpod
Namespace:    default
Priority:     0
Node:         worker-node1/172.16.0.31
Start Time:   Wed, 20 Nov 2019 19:06:36 +0000
Labels:       app=dask
          dask.org/cluster-name=dask-root-99dcf768-4
          dask.org/component=worker
          foo=bar
          user=root
Annotations:  <none>
Status:       Failed
 IP:           10.32.0.2
 IPs:
   IP:  10.32.0.2
Containers:
  dask:
    Container ID:  docker://578dc575fc263c4a3889a4f2cb5e06cd82a00e03cfc6acfd7a98fef703421390
    Image:         daskdev/dask:latest
    Image ID:      docker-pullable://daskdev/dask@sha256:0a936daa94c82cea371c19a2c90c695688ab4e1e7acc905f8b30dfd419adfb6f
Port:          <none>
Host Port:     <none>
Args:
  dask-worker
  --nthreads
  2
  --no-bokeh
  --memory-limit
  6GB
  --death-timeout
  60
State:          Terminated
  Reason:       Error
  Exit Code:    1
  Started:      Wed, 20 Nov 2019 19:06:38 +0000
  Finished:     Wed, 20 Nov 2019 19:08:20 +0000
Ready:          False
Restart Count:  0
Limits:
  cpu:     2
  memory:  6G
Requests:
  cpu:     2
  memory:  6G
Environment:
  EXTRA_PIP_PACKAGES:      fastparquet git+https://github.com/dask/distributed
  DASK_SCHEDULER_ADDRESS:  tcp://172.16.0.76:40119
Mounts:
  /var/run/secrets/kubernetes.io/serviceaccount from default-token-p9f9v (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-p9f9v:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-p9f9v
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     k8s.dask.org/dedicated=worker:NoSchedule
                 k8s.dask.org_dedicated=worker:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age    From                   Message
  ----    ------     ----   ----                   -------
  Normal  Scheduled  5m47s  default-scheduler      Successfully assigned default/workerpod to worker-node1
  Normal  Pulled     5m45s  kubelet, worker-node1  Container image "daskdev/dask:latest" already present on machine
  Normal  Created    5m45s  kubelet, worker-node1  Created container dask
  Normal  Started    5m45s  kubelet, worker-node1  Started container dask
(base) [root@k8s-master example]#
(base) [root@k8s-master example]# kubectl get events
LAST SEEN   TYPE     REASON                    OBJECT              MESSAGE
21m         Normal   Starting                  node/k8s-master     Starting kubelet.
21m         Normal   NodeHasSufficientMemory   node/k8s-master     Node k8s-master status is now: NodeHasSufficientMemory
21m         Normal   NodeHasNoDiskPressure     node/k8s-master     Node k8s-master status is now: NodeHasNoDiskPressure
21m         Normal   NodeHasSufficientPID      node/k8s-master     Node k8s-master status is now: NodeHasSufficientPID
21m         Normal   NodeAllocatableEnforced   node/k8s-master     Updated Node Allocatable limit across pods
21m         Normal   RegisteredNode            node/k8s-master     Node k8s-master event: Registered Node k8s-master in Controller
21m         Normal   Starting                  node/k8s-master     Starting kube-proxy.
18m         Normal   Starting                  node/worker-node1   Starting kubelet.
18m         Normal   NodeHasSufficientMemory   node/worker-node1   Node worker-node1 status is now: NodeHasSufficientMemory
18m         Normal   NodeHasNoDiskPressure     node/worker-node1   Node worker-node1 status is now: NodeHasNoDiskPressure
18m         Normal   NodeHasSufficientPID      node/worker-node1   Node worker-node1 status is now: NodeHasSufficientPID
18m         Normal   NodeAllocatableEnforced   node/worker-node1   Updated Node Allocatable limit across pods
18m         Normal   Starting                  node/worker-node1   Starting kube-proxy.
18m         Normal   RegisteredNode            node/worker-node1   Node worker-node1 event: Registered Node worker-node1 in Controller
17m         Normal   NodeReady                 node/worker-node1   Node worker-node1 status is now: NodeReady
18m         Normal   Starting                  node/worker-node2   Starting kubelet.
18m         Normal   NodeHasSufficientMemory   node/worker-node2   Node worker-node2 status is now: NodeHasSufficientMemory
18m         Normal   NodeHasNoDiskPressure     node/worker-node2   Node worker-node2 status is now: NodeHasNoDiskPressure
18m         Normal   NodeHasSufficientPID      node/worker-node2   Node worker-node2 status is now: NodeHasSufficientPID
18m         Normal   NodeAllocatableEnforced   node/worker-node2   Updated Node Allocatable limit across pods
18m         Normal   Starting                  node/worker-node2   Starting kube-proxy.
17m         Normal   RegisteredNode            node/worker-node2   Node worker-node2 event: Registered Node worker-node2 in Controller
17m         Normal   NodeReady                 node/worker-node2   Node worker-node2 status is now: NodeReady
14m         Normal   Scheduled                 pod/workerpod       Successfully assigned default/workerpod to worker-node1
14m         Normal   Pulled                    pod/workerpod       Container image "daskdev/dask:latest" already present on machine
14m         Normal   Created                   pod/workerpod       Created container dask
14m         Normal   Started                   pod/workerpod       Started container dask
(base) [root@k8s-master example]#

Update- adding pod logs (as suggested by Dawid Kruk):

(base) [root@k8s-master example]# kubectl logs workerpod
+ '[' '' ']'
+ '[' -e /opt/app/environment.yml ']'
+ echo 'no environment.yml'
+ '[' '' ']'
+ '[' 'fastparquet git+https://github.com/dask/distributed' ']'
+ echo 'EXTRA_PIP_PACKAGES environment variable found.  Installing.'
+ /opt/conda/bin/pip install fastparquet git+https://github.com/dask/distributed
no environment.yml
EXTRA_PIP_PACKAGES environment variable found.  Installing.
Collecting git+https://github.com/dask/distributed
  Cloning https://github.com/dask/distributed to /tmp/pip-req-build-i3_1vo06
  Running command git clone -q https://github.com/dask/distributed /tmp/pip-req-build-i3_1vo06
  fatal: unable to access 'https://github.com/dask/distributed/': Could not resolve host: github.com
ERROR: Command errored out with exit status 128: git clone -q https://github.com/dask/distributed /tmp/pip-req-build-i3_1vo06 Check the logs for full command output.
+ exec dask-worker --nthreads 2 --no-bokeh --memory-limit 6GB --death-timeout 60
/opt/conda/lib/python3.7/site-packages/distributed/cli/dask_worker.py:252: UserWarning: The --bokeh/--no-bokeh flag has been renamed to --dashboard/--no-dashboard.
  "The --bokeh/--no-bokeh flag has been renamed to --dashboard/--no-dashboard. "
distributed.nanny - INFO -         Start Nanny at: 'tcp://10.32.0.2:45097'
distributed.worker - INFO -       Start worker at:      tcp://10.32.0.2:36389
distributed.worker - INFO -          Listening to:      tcp://10.32.0.2:36389
distributed.worker - INFO - Waiting to connect to:    tcp://172.16.0.76:43389
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          2
distributed.worker - INFO -                Memory:                    6.00 GB
distributed.worker - INFO -       Local Directory:           /worker-55rpow8j
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Waiting to connect to:    tcp://172.16.0.76:43389
distributed.worker - INFO - Waiting to connect to:    tcp://172.16.0.76:43389
distributed.worker - INFO - Waiting to connect to:    tcp://172.16.0.76:43389
distributed.worker - INFO - Waiting to connect to:    tcp://172.16.0.76:43389
distributed.worker - INFO - Waiting to connect to:    tcp://172.16.0.76:43389
distributed.nanny - INFO - Closing Nanny at 'tcp://10.32.0.2:45097'
distributed.worker - INFO - Stopping worker at tcp://10.32.0.2:36389
distributed.worker - INFO - Closed worker has not yet started: None
distributed.dask_worker - INFO - Timed out starting worker
distributed.dask_worker - INFO - End worker
(base) [root@k8s-master example]# git
usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]

Seems to me like dask pod connection to worker node is issue but I see worker nodes in ready state and other pods (nginx) are running on worker nodes:

(base) [root@k8s-master example]# kubectl get nodes
NAME           STATUS   ROLES    AGE   VERSION
k8s-master     Ready    master   20h   v1.16.3
worker-node1   Ready    worker   20h   v1.16.2
worker-node2   Ready    worker   20h   v1.16.2
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE   IP             NODE           NOMINATED NODE   READINESS GATES
default       nginx-deployment-54f57cf6bf-b9nfd    1/1     Running   0          34s   10.32.0.2      worker-node1   <none>           <none>
default       nginx-deployment-54f57cf6bf-pnp59    1/1     Running   0          34s   10.40.0.0      worker-node2   <none>           <none>
default       workerpod                            0/1     Error     0          56m   10.32.0.2      worker-node1   <none>           <none>
kube-system   coredns-5644d7b6d9-l4jsd             1/1     Running   0          21h   10.32.0.4      k8s-master     <none>           <none>
kube-system   coredns-5644d7b6d9-q679h             1/1     Running   0          21h   10.32.0.3      k8s-master     <none>           <none>
kube-system   etcd-k8s-master                      1/1     Running   0          21h   172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-apiserver-k8s-master            1/1     Running   0          21h   172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-controller-manager-k8s-master   1/1     Running   0          21h   172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-proxy-ctgj8                     1/1     Running   0          21h   172.16.0.114   worker-node2   <none>           <none>
kube-system   kube-proxy-f78bm                     1/1     Running   0          21h   172.16.0.76    k8s-master     <none>           <none>
kube-system   kube-proxy-ksk59                     1/1     Running   0          21h   172.16.0.31    worker-node1   <none>           <none>
kube-system   kube-scheduler-k8s-master            1/1     Running   0          21h   172.16.0.76    k8s-master     <none>           <none>
kube-system   weave-net-q2zwn                      2/2     Running   0          21h   172.16.0.76    k8s-master     <none>           <none>
kube-system   weave-net-r9tzs                      2/2     Running   0          21h   172.16.0.31    worker-node1   <none>           <none>
kube-system   weave-net-tm8xx                      2/2     Running   0          21h   172.16.0.114   worker-node2   <none>           <none>

Update2 - Added nslookup output (Suggested by VAS) Ref: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#create-a-simple-pod-to-use-as-a-test-environment

(base) [root@k8s-master example]# kubectl exec -ti workerpod -- nslookup kubernetes.default
OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"nslookup\": executable file not found in $PATH": unknown
command terminated with exit code 126

How to add executable file to pod? It is set on my host.

(base) [root@k8s-master example]# nslookup github.com
Server:         172.31.0.2
Address:        172.31.0.2#53

Non-authoritative answer:
Name:   github.com
Address: 140.82.114.3

Update 3: nslookup for dnsutils (Suggested by VAS)

(base) [root@k8s-master example]# kubectl run dnsutils -it --rm=true --restart=Never --image=tutum/dnsutils cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
pod "dnsutils" deleted
(base) [root@k8s-master example]# kubectl run dnsutils -it --restart=Never --image=tutum/dnsutils nslookup github.com
If you don't see a command prompt, try pressing enter.
;; connection timed out; no servers could be reached

pod default/dnsutils terminated (Error)
(base) [root@k8s-master example]# kubectl logs dnsutils

;; connection timed out; no servers could be reached

(base) [root@k8s-master example]#

Update 4:

(base) [root@k8s-master example]# kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
(base) [root@k8s-master example]# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE
coredns-5644d7b6d9-l4jsd   1/1     Running   0          25h
coredns-5644d7b6d9-q679h   1/1     Running   0          25h
(base) [root@k8s-master example]# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE
coredns-5644d7b6d9-l4jsd   1/1     Running   0          25h
coredns-5644d7b6d9-q679h   1/1     Running   0          25h
(base) [root@k8s-master example]# for p in $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do kubectl logs --namespace=kube-system $p; done
.:53
2019-11-20T19:01:42.161Z [INFO] plugin/reload: Running configuration MD5 = f64cb9b977c7dfca58c4fab108535a76
2019-11-20T19:01:42.161Z [INFO] CoreDNS-1.6.2
2019-11-20T19:01:42.161Z [INFO] linux/amd64, go1.12.8, 795a3eb
CoreDNS-1.6.2
linux/amd64, go1.12.8, 795a3eb
.:53
2019-11-20T19:01:41.862Z [INFO] plugin/reload: Running configuration MD5 = f64cb9b977c7dfca58c4fab108535a76
2019-11-20T19:01:41.862Z [INFO] CoreDNS-1.6.2
2019-11-20T19:01:41.862Z [INFO] linux/amd64, go1.12.8, 795a3eb
CoreDNS-1.6.2
linux/amd64, go1.12.8, 795a3eb
(base) [root@k8s-master example]# kubectl get service
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   26h
(base) [root@k8s-master example]# kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                                            AGE
kube-dns   10.32.0.3:53,10.32.0.4:53,10.32.0.3:53 + 3 more...   26h
(base) [root@k8s-master example]# kubectl -n kube-system edit configmap coredns
Edit cancelled, no changes made.

nslookup from worker node1

[root@worker-node1 ec2-user]# nslookup 10.96.0.10
Server:         172.31.0.2
Address:        172.31.0.2#53

Non-authoritative answer:
10.0.96.10.in-addr.arpa name = ip-10-96-0-10.ec2.internal.

Authoritative answers can be found from:

[root@worker-node1 ec2-user]# nslookup 10.96.0.1
Server:         172.31.0.2
Address:        172.31.0.2#53

Non-authoritative answer:
1.0.96.10.in-addr.arpa  name = ip-10-96-0-1.ec2.internal.

Authoritative answers can be found from:

[root@worker-node1 ec2-user]#

nslookup from worker node2

[root@worker-node2 ec2-user]# nslookup 10.96.0.10
Server:         172.31.0.2
Address:        172.31.0.2#53

Non-authoritative answer:
10.0.96.10.in-addr.arpa name = ip-10-96-0-10.ec2.internal.

Authoritative answers can be found from:

[root@worker-node2 ec2-user]# nslookup 10.96.0.1
Server:         172.31.0.2
Address:        172.31.0.2#53

Non-authoritative answer:
1.0.96.10.in-addr.arpa  name = ip-10-96-0-1.ec2.internal.

Authoritative answers can be found from:

-- Manvi

dask

dask-distributed

dask-kubernetes

kubernetes

python

1 Answer

11/21/2019

Update:

As I can see from the comments, I also need to cover some basic concepts in the answer.

To ensure that Kubernetes cluster works well some requirements should be fulfilled.

All Kubernetes nodes must have full network connectivity.
It means that any cluster node should be able to communicate with any other cluster node using any network protocol and any port (in case of tcp/udp), without NAT. Some cloud environments require additional custom firewall rules to accomplish that. Calico example
Kubernetes Pods should be able to communicate with pods scheduled on the other nodes.
This functionality is provided by [CNI network add-on]. Most popular add-ons require additional option in Kubernetes control plane, which usually set by kubeadm init --pod-network-cidr=a.b.c.d/16 command line option. Note, that default IP subnets for different network add-ons are not the same.
If you want to use custom Pod subnet for the particular network add-on, you have to customize network add-on deployment YAML file before applying it to the cluster.
The inter Pod connectivity could be easily tested by sending ICMP or curl requests from node CLI or Pod CLI to any IP address of Pod, scheduled on another node. Note, that Service ClusterIP doesn't respond on ICMP requests, because it's nothing more that set of iptables forwarding rules. The full list of pods with node names could be showed using the following command:
```
kubectl get pods --all-namespaces -o wide
```
For service discovery functionality, working DNS service in Kubernetes cluster is the must.
Usually it's kubedns for Kubernetes versions prior v1.9 and coredns for newer clusters.
Kubernetes DNS service usually contains one deployment with two replicas and one ClusterIP Service with default IP address 10.96.0.10.

Looking at the data in the question I suspect that you may have problem with network add-on. I would test it using the following commands, which should return successful results on the healthy cluster:

# check connectivity from the master node to nginx pods:
k8s-master$ ping 10.32.0.2
k8s-master$ curl 10.32.0.2 
k8s-master$ ping 10.40.0.0
k8s-master$ curl 10.40.0.0

# check connectivity from other nodes to coredns pods and DNS Service:
worker-node1$ nslookup github.com 10.32.0.4
worker-node1$ nslookup github.com 10.32.0.3
worker-node1$ nslookup github.com 10.96.0.10

worker-node2$ nslookup github.com 10.32.0.4
worker-node2$ nslookup github.com 10.32.0.3
worker-node2$ nslookup github.com 10.96.0.10

Troubleshooting network plugin is a quite big piece of knowledge to write in one answer, so, if you need to fix network add-on, please do some search through existing answers and ask another question in case you found nothing suitable.

Below part describes how to check DNS service in Kubernetes cluster:

In case of dnsPolicy ClusterFirst(which is default) any DNS query that does not match the configured cluster domain suffix, such as “www.kubernetes.io”, is forwarded to the upstream nameserver inherited from the node.

How to check DNS client configuration on the cluster node:

$ cat /etc/resolv.conf
$ systemd-resolve --status

How to check if DNS client on the node works well:

$ nslookup github.com

Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   github.com
Address: 140.82.118.4

How to get Kubernetes cluster DNS configuration:

$ kubectl get svc,pods -n kube-system -o wide | grep dns 

service/kube-dns        ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   185d   k8s-app=kube-dns

pod/coredns-fb8b8dccf-5jjv8                     1/1     Running   122        185d   10.244.0.16   kube-master2   <none>           <none>
pod/coredns-fb8b8dccf-5pbkg                     1/1     Running   122        185d   10.244.0.17   kube-master2   <none>           <none>

$ kubectl get configmap coredns -n kube-system -o yaml

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2019-05-20T16:10:42Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "1657005"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: d1598034-7b19-11e9-9137-42010a9c0004

How to check if the cluster DNS service (coredns) works well:

$ nslookup github.com 10.96.0.10

Server:         10.96.0.10
Address:        10.96.0.10#53

Non-authoritative answer:
Name:   github.com
Address: 140.82.118.4

How to check if the regular pod can resolve particular DNS name:

$ kubectl run dnsutils -it --rm=true --restart=Never --image=tutum/dnsutils cat /etc/resolv.conf

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local 
options ndots:5

$ kubectl run dnsutils -it --rm=true --restart=Never --image=tutum/dnsutils nslookup github.com

Server:         10.96.0.10
Address:        10.96.0.10#53

Non-authoritative answer:
Name:   github.com
Address: 140.82.118.3

More details about DNS troubleshooting can be found in the official documentation

-- VAS

Source: StackOverflow

KQ

Dask Kubernetes worker pod giving error status

Similar Questions

1 Answer

K
Q