Appreciate any help in getting to the root cause of this failure. Brought up a kubernetes 1.11.1 cluster with 1 master and 2 nodes (master and one node on same machine) on CentOS 7.5 VMs.
$ uname -a
Linux master.home 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)
$ docker version
Client:
Version: 17.09.1-ce
API version: 1.32
Go version: go1.8.3
Git commit: 19e2cf6
Built: Thu Dec 7 22:23:40 2017
OS/Arch: linux/amd64
Server:
Version: 17.09.1-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: 19e2cf6
Built: Thu Dec 7 22:25:03 2017
OS/Arch: linux/amd64
Experimental: false
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Node interface ip addresses look good to me:
$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:5b:19:5f brd ff:ff:ff:ff:ff:ff
inet 192.168.1.111/24 brd 192.168.1.255 scope global dynamic enp0s3
valid_lft 82102sec preferred_lft 82102sec
inet6 fe80::a00:27ff:fe5b:195f/64 scope link tentative dadfailed
valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:38:a6:c7:bd brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:38ff:fea6:c7bd/64 scope link
valid_lft forever preferred_lft forever
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 26:1e:c3:e9:a3:db brd ff:ff:ff:ff:ff:ff
inet 10.150.69.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::241e:c3ff:fee9:a3db/64 scope link
valid_lft forever preferred_lft forever
12: vethc0ae215@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
link/ether 9a:1c:9d:21:18:57 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::981c:9dff:fe21:1857/64 scope link
valid_lft forever preferred_lft forever
etcd cluster is healthy:
$ etcdctl --endpoints=https://192.168.1.111:2379 --cert-file=/var/lib/kubernetes/kubernetes.pem --key-file=/var/lib/kubernetes/kubernetes-key.pem cluster-health
member ca38fd8eb3e17372 is healthy: got healthy result from https://192.168.1.111:2379
cluster is healthy
$ etcdctl --endpoints=https://192.168.1.111:2379 --cert-file=/var/lib/kubernetes/kubernetes.pem --key-file=/var/lib/kubernetes/kubernetes-key.pem get /atomic.io/network/config Network
{ "Network": "10.150.0.0/16", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}
Also updated iptables:
$ iptables --version
iptables v1.6.2
Overview from kubectl:
$ kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-55f86bf584-9vz6k 1/1 Running 11 39m
kube-system pod/coredns-55f86bf584-z4nvv 1/1 Running 11 39m
kube-system pod/kube-flannel-ds-amd64-kw972 0/1 CrashLoopBackOff 6 10m
kube-system pod/kube-flannel-ds-amd64-rhv2c 0/1 CrashLoopBackOff 6 10m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.32.0.1 <none> 443/TCP 2h
kube-system service/kube-dns ClusterIP 10.32.0.10 <none> 53/UDP,53/TCP 39m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-flannel-ds-amd64 2 2 0 2 0 beta.kubernetes.io/arch=amd64 10m
kube-system daemonset.apps/kube-flannel-ds-arm 0 0 0 0 0 beta.kubernetes.io/arch=arm 10m
kube-system daemonset.apps/kube-flannel-ds-arm64 0 0 0 0 0 beta.kubernetes.io/arch=arm64 10m
kube-system daemonset.apps/kube-flannel-ds-ppc64le 0 0 0 0 0 beta.kubernetes.io/arch=ppc64le 10m
kube-system daemonset.apps/kube-flannel-ds-s390x 0 0 0 0 0 beta.kubernetes.io/arch=s390x 10m
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2 2 2 2 39m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-55f86bf584 2 2 2 39m
Used this manifest for coredns where i changed "Network": "10.150.0.0/16" to "Network": "10.150.0.0/16"
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
And this for coreDNS
kubectl apply -f https://storage.googleapis.com/kubernetes-the-hard-way/coredns.yaml
Not sure why i see complains about x.509 in logs from respective pods:
$ kubectl logs kube-flannel-ds-amd64-kw972 -n kube-system
I1126 14:51:38.415251 1 main.go:475] Determining IP address of default interface
I1126 14:51:38.417393 1 main.go:488] Using interface with name enp0s3 and address 192.168.1.111
I1126 14:51:38.417535 1 main.go:505] Defaulting external address to interface address (192.168.1.111)
E1126 14:51:38.427865 1 main.go:232] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-amd64-kw972': Get https://10.32.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-amd64-kw972: x509: certificate is valid for 192.168.1.111, 127.0.0.1, not 10.32.0.1
$ kubectl logs coredns-55f86bf584-z4nvv -n kube-system
E1126 14:50:51.845470 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.32.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: x509: certificate is valid for 192.168.1.111, 127.0.0.1, not 10.32.0.1
E1126 14:50:51.850446 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.32.0.1:443/api/v1/services?limit=500&resourceVersion=0: x509: certificate is valid for 192.168.1.111, 127.0.0.1, not 10.32.0.1
Here 192.168.1.111 is my master node and 10.32.0.1 is kubernetes service ip.
I did not use kubeadm to bring up this cluster. Did most of the bootstrapping by following https://github.com/kelseyhightower/kubernetes-the-hard-way
Also not sure if SNAT is set right:
$ sudo conntrack -L -d 10.32.0.1
tcp 6 17 TIME_WAIT src=192.168.1.111 dst=10.32.0.1 sport=37862 dport=443 src=192.168.1.111 dst=192.168.1.111 sport=6443 dport=37862 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.
$ sudo iptables -t nat -L KUBE-SERVICES
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-MARK-MASQ udp -- !10.150.0.0/16 10.32.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:domain
KUBE-SVC-TCOU7JCQXEZGVUNU udp -- anywhere 10.32.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:domain
KUBE-MARK-MASQ tcp -- !10.150.0.0/16 10.32.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domain
KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- anywhere 10.32.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domain
KUBE-MARK-MASQ tcp -- !10.150.0.0/16 10.32.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:https
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- anywhere 10.32.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:https
KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
Flannel config:
$ cat /etc/sysconfig/flanneld
# Flanneld configuration options
# etcd url location. Point this to the server where etcd runs
FLANNEL_ETCD_ENDPOINTS="https://192.168.1.111:2379"
# etcd config key. This is the configuration key that flannel queries
# For address range assignment
FLANNEL_ETCD_PREFIX="/atomic.io/network"
# Any additional options that you want to pass
FLANNEL_OPTIONS="-v=9 --etcd-certfile=/var/lib/kubernetes/kubernetes.pem --etcd-keyfile=/var/lib/kubernetes/kubernetes-key.pem --remote-cafile=/var/lib/kubernetes/ca.pem"
Edit1: Updated title to better reflect the underlying concern. My goal is to ensure DNS is working as expected in my k8s ecosystem. Tested nslookup with busybox image 1.28.
$ kubectl exec -ti busybox -- nslookup kubernetes
Server: 10.32.0.10
Address 1: 10.32.0.10
nslookup: can't resolve 'kubernetes'
command terminated with exit code 1
Update: The x509 error is gone and coredns is up and running after upgrading docker to 18.06.1-ce
and editing kubelet.service file to use: --container-runtime-endpoint=unix:///var/run/docker/containerd/docker-containerd.sock
One step closer but not there yet.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox 1/1 Terminating 1 1h
kube-system coredns-55f86bf584-n84nw 1/1 Running 0 10m
kube-system coredns-55f86bf584-zl88b 1/1 Running 0 10m
$ kubectl logs coredns-55f86bf584-n84nw -n kube-system
.:53
2018/11/26 18:49:48 [INFO] CoreDNS-1.2.2
2018/11/26 18:49:48 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/11/26 18:49:48 [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381