I am trying to build a 3 master, 3 worker Kubernetes Cluster, with 3 separate etcd servers.
[root@K8sMaster01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8smaster01 Ready master 5h v1.11.1
k8smaster02 Ready master 4h v1.11.1
k8smaster03 Ready master 4h v1.11.1
k8snode01 Ready <none> 4h v1.11.1
k8snode02 Ready <none> 4h v1.11.1
k8snode03 Ready <none> 4h v1.11.1
I have spent weeks trying to get those to work, but can not get beyond one problem.
The containers / pods cannot access the API server.
[root@K8sMaster01 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
[root@K8sMaster01 ~]# cat /etc/redhat-release
Fedora release 28 (Twenty Eight)
[root@K8sMaster01 ~]# uname -a
Linux K8sMaster01 4.16.3-301.fc28.x86_64 #1 SMP Mon Apr 23 21:59:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-c2wbh 1/1 Running 1 4h
coredns-78fcdf6894-psbtq 1/1 Running 1 4h
heapster-77f99d6b7c-5pxj6 1/1 Running 0 4h
kube-apiserver-k8smaster01 1/1 Running 1 4h
kube-apiserver-k8smaster02 1/1 Running 1 4h
kube-apiserver-k8smaster03 1/1 Running 1 4h
kube-controller-manager-k8smaster01 1/1 Running 1 4h
kube-controller-manager-k8smaster02 1/1 Running 1 4h
kube-controller-manager-k8smaster03 1/1 Running 1 4h
kube-flannel-ds-amd64-542x6 1/1 Running 0 4h
kube-flannel-ds-amd64-6dw2g 1/1 Running 4 4h
kube-flannel-ds-amd64-h6j9b 1/1 Running 1 4h
kube-flannel-ds-amd64-mgggx 1/1 Running 0 3h
kube-flannel-ds-amd64-p8xfk 1/1 Running 0 4h
kube-flannel-ds-amd64-qp86h 1/1 Running 4 4h
kube-proxy-4bqxh 1/1 Running 0 3h
kube-proxy-56p4n 1/1 Running 0 3h
kube-proxy-7z8p7 1/1 Running 0 3h
kube-proxy-b59ns 1/1 Running 0 3h
kube-proxy-fc6zg 1/1 Running 0 3h
kube-proxy-wrxg7 1/1 Running 0 3h
kube-scheduler-k8smaster01 1/1 Running 1 4h
kube-scheduler-k8smaster02 1/1 Running 1 4h
kube-scheduler-k8smaster03 1/1 Running 1 4h
**kubernetes-dashboard-6948bdb78-4f7qj 1/1 Running 19 1h**
node-problem-detector-v0.1-77fdw 1/1 Running 0 4h
node-problem-detector-v0.1-96pld 1/1 Running 1 4h
node-problem-detector-v0.1-ctnfn 1/1 Running 0 3h
node-problem-detector-v0.1-q2xvw 1/1 Running 0 4h
node-problem-detector-v0.1-vvf4j 1/1 Running 1 4h
traefik-ingress-controller-7w44f 1/1 Running 0 4h
traefik-ingress-controller-8cprj 1/1 Running 1 4h
traefik-ingress-controller-f6c7q 1/1 Running 0 3h
traefik-ingress-controller-tf8zw 1/1 Running 0 4h
kube-ops-view-6744bdc77d-2x5w8 1/1 Running 0 2h
kube-ops-view-redis-74578dcc5d-5fnvf 1/1 Running 0 2h
The kubernetes-dashboard will not start, but actually the same is for the kube-ops-view. Core DNS also has errors. All this to me is something to do with networks. I have tried:
sudo iptables -P FORWARD ACCEPT
sudo iptables --policy FORWARD ACCEPT
sudo iptables -A FORWARD -o flannel.1 -j ACCEPT
Core DNS give this error in the logs:
[root@K8sMaster01 ~]# kubectl logs coredns-78fcdf6894-c2wbh -n kube-system
.:53
2018/08/26 15:15:28 [INFO] CoreDNS-1.1.3
2018/08/26 15:15:28 [INFO] linux/amd64, go1.10.1, b0fd575c
2018/08/26 15:15:28 [INFO] plugin/reload: Running configuration MD5 = 2a066f12ec80aeb2b92740dd74c17138
CoreDNS-1.1.3
linux/amd64, go1.10.1, b0fd575c
E0826 17:12:19.624560 1 reflector.go:322] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to watch *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=556&timeoutSeconds=389&watch=true: dial tcp 10.96.0.1:443: i/o timeout
2018/08/26 17:35:34 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. A: unreachable backend: read udp 10.96.0.7:46862->10.4.4.28:53: i/o timeout
2018/08/26 17:35:34 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. AAAA: unreachable backend: read udp 10.96.0.7:46690->10.4.4.28:53: i/o timeout
2018/08/26 17:35:37 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. AAAA: unreachable backend: read udp 10.96.0.7:60267->10.4.4.28:53: i/o timeout
2018/08/26 17:35:37 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. A: unreachable backend: read udp 10.96.0.7:41482->10.4.4.28:53: i/o timeout
2018/08/26 17:36:58 [ERROR] 2 kube-ops-view-redis.specsavers.local. AAAA: unreachable backend: read udp 10.96.0.7:58042->10.4.4.28:53: i/o timeout
2018/08/26 17:36:58 [ERROR] 2 kube-ops-view-redis.specsavers.local. A: unreachable backend: read udp 10.96.0.7:53149->10.4.4.28:53: i/o timeout
2018/08/26 17:37:01 [ERROR] 2 kube-ops-view-redis.specsavers.local. A: unreachable backend: read udp 10.96.0.7:36861->10.4.4.28:53: i/o timeout
2018/08/26 17:37:01 [ERROR] 2 kube-ops-view-redis.specsavers.local. AAAA: unreachable backend: read udp 10.96.0.7:43235->10.4.4.28:53: i/o timeout
The Dash board:
[root@K8sMaster01 ~]# kubectl logs kubernetes-dashboard-6948bdb78-4f7qj -n kube-system
2018/08/26 20:10:31 Starting overwatch
2018/08/26 20:10:31 Using in-cluster config to connect to apiserver
2018/08/26 20:10:31 Using service account token for csrf signing
2018/08/26 20:10:31 No request provided. Skipping authorization
2018/08/26 20:11:01 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
kube-ops-view:
ERROR:kube_ops_view.update:Failed to query cluster 10-96-0-1:443 (https://10.96.0.1:443): ConnectTimeout (try 141, wait 63 seconds)
10.96.3.1 - - [2018-08-26 20:12:34] "GET /health HTTP/1.1" 200 117 0.001002
10.96.3.1 - - [2018-08-26 20:12:44] "GET /health HTTP/1.1" 200 117 0.000921
10.96.3.1 - - [2018-08-26 20:12:54] "GET /health HTTP/1.1" 200 117 0.000926
10.96.3.1 - - [2018-08-26 20:13:04] "GET /health HTTP/1.1" 200 117 0.000924
10.96.3.1 - - [2018-08-26 20:13:14] "GET /health HTTP/1.1" 200 117 0.000942
10.96.3.1 - - [2018-08-26 20:13:24] "GET /health HTTP/1.1" 200 117 0.000924
10.96.3.1 - - [2018-08-26 20:13:34] "GET /health HTTP/1.1" 200 117 0.000939
ERROR:kube_ops_view.update:Failed to query cluster 10-96-0-1:443 (https://10.96.0.1:443): ConnectTimeout (try 142, wait 61 seconds)
Flannel has created the networks:
[root@K8sMaster01 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu
65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever 2: ens192: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group
default qlen 1000
link/ether 00:50:56:9a:80:f7 brd ff:ff:ff:ff:ff:ff
inet 10.34.88.182/24 brd 10.34.88.255 scope global dynamic ens192
valid_lft 7071sec preferred_lft 7071sec
inet 10.10.40.90/24 brd 10.10.40.255 scope global ens192:1
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe9a:80f7/64 scope link
valid_lft forever preferred_lft forever 3: docker0: <NO-ARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
group default
link/ether 02:42:cf:ec:b3:ee brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever 4: flannel.1: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
group default
link/ether 06:df:1e:87:b8:ee brd ff:ff:ff:ff:ff:ff
inet 10.96.0.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::4df:1eff:fe87:b8ee/64 scope link
valid_lft forever preferred_lft forever 5: cni0: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
group default qlen 1000
link/ether 0a:58:0a:60:00:01 brd ff:ff:ff:ff:ff:ff
inet 10.96.0.1/24 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::8c77:39ff:fe6e:8710/64 scope link
valid_lft forever preferred_lft forever 7: veth9527916b@if3: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0
state UP group default
link/ether 46:62:b6:b8:b9:ac brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::4462:b6ff:feb8:b9ac/64 scope link
valid_lft forever preferred_lft forever 8: veth6e6f08f5@if3: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0
state UP group default
link/ether 3e:a5:4b:8d:11:ce brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::3ca5:4bff:fe8d:11ce/64 scope link
valid_lft forever preferred_lft forever
I can ping the IP:
[root@K8sMaster01 ~]# ping 10.96.0.1
PING 10.96.0.1 (10.96.0.1) 56(84) bytes of data.
64 bytes from 10.96.0.1: icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from 10.96.0.1: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 10.96.0.1: icmp_seq=3 ttl=64 time=0.042 ms
and telent the port:
[root@K8sMaster01 ~]# telnet 10.96.0.1 443
Trying 10.96.0.1...
Connected to 10.96.0.1.
Escape character is '^]'.
Some one PLEASE save my back holiday weekend and tell me what is going wrong!
As requested here is my get services:
[root@K8sMaster01 ~]# kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default blackbox-database ClusterIP 10.110.56.121 <none> 3306/TCP 5h
default kube-ops-view ClusterIP 10.105.35.23 <none> 82/TCP 1d
default kube-ops-view-redis ClusterIP 10.107.254.193 <none> 6379/TCP 1d
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1d
kube-system heapster ClusterIP 10.103.5.79 <none> 80/TCP 1d
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 1d
kube-system kubernetes-dashboard ClusterIP 10.96.220.152 <none> 443/TCP 1d
kube-system traefik-ingress-service ClusterIP 10.102.84.167 <none> 80/TCP,8080/TCP 1d
liab-live-bb blackbox-application ClusterIP 10.98.40.25 <none> 8080/TCP 5h
liab-live-bb blackbox-database ClusterIP 10.108.43.196 <none> 3306/TCP 5h
Telnet to port 46690:
[root@K8sMaster01 ~]# telnet 10.96.0.7 46690
Trying 10.96.0.7...
(no response)
Today I tried deploying two of my applications to the cluster, as can be seen in the get services. The "app" is unable to connect to the "db" it cannot resolve the DB service name. I believe that I have an issue with the networking, not sure if it is at the host level, or with in the kubernetes layer. I did notice my resolv.conf files were not pointing to localhost, and found some changes to make to the coredns config. When Ilooked at its configuration it was trying to point to a IP V6 Address, so changed it to this:
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local 10.96.0.0/12 {
pods insecure
}
prometheus :9153
proxy 10.4.4.28
cache 30
reload
}
kind: ConfigMap
metadata:
creationTimestamp: 2018-08-27T12:28:57Z
name: coredns
namespace: kube-system
resourceVersion: "174571"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: c5016361-a9f4-11e8-b0b4-0050569afad9