Building a Bare Metal Kubernetes Cluster with kubeadm

8/26/2018

I am trying to build a 3 master, 3 worker Kubernetes Cluster, with 3 separate etcd servers.

[root@K8sMaster01 ~]# kubectl get nodes
NAME          STATUS    ROLES     AGE       VERSION
k8smaster01   Ready     master    5h        v1.11.1
k8smaster02   Ready     master    4h        v1.11.1
k8smaster03   Ready     master    4h        v1.11.1
k8snode01     Ready     <none>    4h        v1.11.1
k8snode02     Ready     <none>    4h        v1.11.1
k8snode03     Ready     <none>    4h        v1.11.1

I have spent weeks trying to get those to work, but can not get beyond one problem.

The containers / pods cannot access the API server.

[root@K8sMaster01 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}


[root@K8sMaster01 ~]# cat /etc/redhat-release
Fedora release 28 (Twenty Eight)


[root@K8sMaster01 ~]# uname -a
Linux K8sMaster01 4.16.3-301.fc28.x86_64 #1 SMP Mon Apr 23 21:59:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


NAME                                   READY     STATUS    RESTARTS   AGE
coredns-78fcdf6894-c2wbh               1/1       Running   1          4h
coredns-78fcdf6894-psbtq               1/1       Running   1          4h
heapster-77f99d6b7c-5pxj6              1/1       Running   0          4h
kube-apiserver-k8smaster01             1/1       Running   1          4h
kube-apiserver-k8smaster02             1/1       Running   1          4h
kube-apiserver-k8smaster03             1/1       Running   1          4h
kube-controller-manager-k8smaster01    1/1       Running   1          4h
kube-controller-manager-k8smaster02    1/1       Running   1          4h
kube-controller-manager-k8smaster03    1/1       Running   1          4h
kube-flannel-ds-amd64-542x6            1/1       Running   0          4h
kube-flannel-ds-amd64-6dw2g            1/1       Running   4          4h
kube-flannel-ds-amd64-h6j9b            1/1       Running   1          4h
kube-flannel-ds-amd64-mgggx            1/1       Running   0          3h
kube-flannel-ds-amd64-p8xfk            1/1       Running   0          4h
kube-flannel-ds-amd64-qp86h            1/1       Running   4          4h
kube-proxy-4bqxh                       1/1       Running   0          3h
kube-proxy-56p4n                       1/1       Running   0          3h
kube-proxy-7z8p7                       1/1       Running   0          3h
kube-proxy-b59ns                       1/1       Running   0          3h
kube-proxy-fc6zg                       1/1       Running   0          3h
kube-proxy-wrxg7                       1/1       Running   0          3h
kube-scheduler-k8smaster01             1/1       Running   1          4h
kube-scheduler-k8smaster02             1/1       Running   1          4h
kube-scheduler-k8smaster03             1/1       Running   1          4h
**kubernetes-dashboard-6948bdb78-4f7qj   1/1       Running   19         1h**
node-problem-detector-v0.1-77fdw       1/1       Running   0          4h
node-problem-detector-v0.1-96pld       1/1       Running   1          4h
node-problem-detector-v0.1-ctnfn       1/1       Running   0          3h
node-problem-detector-v0.1-q2xvw       1/1       Running   0          4h
node-problem-detector-v0.1-vvf4j       1/1       Running   1          4h
traefik-ingress-controller-7w44f       1/1       Running   0          4h
traefik-ingress-controller-8cprj       1/1       Running   1          4h
traefik-ingress-controller-f6c7q       1/1       Running   0          3h
traefik-ingress-controller-tf8zw       1/1       Running   0          4h
kube-ops-view-6744bdc77d-2x5w8         1/1       Running   0          2h
kube-ops-view-redis-74578dcc5d-5fnvf   1/1       Running   0          2h

The kubernetes-dashboard will not start, but actually the same is for the kube-ops-view. Core DNS also has errors. All this to me is something to do with networks. I have tried:

sudo iptables -P FORWARD ACCEPT
sudo iptables --policy FORWARD ACCEPT
sudo iptables -A FORWARD -o flannel.1 -j ACCEPT

Core DNS give this error in the logs:

[root@K8sMaster01 ~]#  kubectl logs coredns-78fcdf6894-c2wbh -n kube-system
.:53
2018/08/26 15:15:28 [INFO] CoreDNS-1.1.3
2018/08/26 15:15:28 [INFO] linux/amd64, go1.10.1, b0fd575c
2018/08/26 15:15:28 [INFO] plugin/reload: Running configuration MD5 = 2a066f12ec80aeb2b92740dd74c17138
CoreDNS-1.1.3
linux/amd64, go1.10.1, b0fd575c
E0826 17:12:19.624560       1 reflector.go:322] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to watch *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=556&timeoutSeconds=389&watch=true: dial tcp 10.96.0.1:443: i/o timeout
2018/08/26 17:35:34 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. A: unreachable backend: read udp 10.96.0.7:46862->10.4.4.28:53: i/o timeout
2018/08/26 17:35:34 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. AAAA: unreachable backend: read udp 10.96.0.7:46690->10.4.4.28:53: i/o timeout
2018/08/26 17:35:37 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. AAAA: unreachable backend: read udp 10.96.0.7:60267->10.4.4.28:53: i/o timeout
2018/08/26 17:35:37 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. A: unreachable backend: read udp 10.96.0.7:41482->10.4.4.28:53: i/o timeout
2018/08/26 17:36:58 [ERROR] 2 kube-ops-view-redis.specsavers.local. AAAA: unreachable backend: read udp 10.96.0.7:58042->10.4.4.28:53: i/o timeout
2018/08/26 17:36:58 [ERROR] 2 kube-ops-view-redis.specsavers.local. A: unreachable backend: read udp 10.96.0.7:53149->10.4.4.28:53: i/o timeout
2018/08/26 17:37:01 [ERROR] 2 kube-ops-view-redis.specsavers.local. A: unreachable backend: read udp 10.96.0.7:36861->10.4.4.28:53: i/o timeout
2018/08/26 17:37:01 [ERROR] 2 kube-ops-view-redis.specsavers.local. AAAA: unreachable backend: read udp 10.96.0.7:43235->10.4.4.28:53: i/o timeout

The Dash board:

[root@K8sMaster01 ~]#  kubectl logs kubernetes-dashboard-6948bdb78-4f7qj -n kube-system
2018/08/26 20:10:31 Starting overwatch
2018/08/26 20:10:31 Using in-cluster config to connect to apiserver
2018/08/26 20:10:31 Using service account token for csrf signing
2018/08/26 20:10:31 No request provided. Skipping authorization
2018/08/26 20:11:01 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ

kube-ops-view:

ERROR:kube_ops_view.update:Failed to query cluster 10-96-0-1:443 (https://10.96.0.1:443): ConnectTimeout (try 141, wait 63 seconds)
10.96.3.1 - - [2018-08-26 20:12:34] "GET /health HTTP/1.1" 200 117 0.001002
10.96.3.1 - - [2018-08-26 20:12:44] "GET /health HTTP/1.1" 200 117 0.000921
10.96.3.1 - - [2018-08-26 20:12:54] "GET /health HTTP/1.1" 200 117 0.000926
10.96.3.1 - - [2018-08-26 20:13:04] "GET /health HTTP/1.1" 200 117 0.000924
10.96.3.1 - - [2018-08-26 20:13:14] "GET /health HTTP/1.1" 200 117 0.000942
10.96.3.1 - - [2018-08-26 20:13:24] "GET /health HTTP/1.1" 200 117 0.000924
10.96.3.1 - - [2018-08-26 20:13:34] "GET /health HTTP/1.1" 200 117 0.000939
ERROR:kube_ops_view.update:Failed to query cluster 10-96-0-1:443 (https://10.96.0.1:443): ConnectTimeout (try 142, wait 61 seconds)

Flannel has created the networks:

     [root@K8sMaster01 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu
 65536 qdisc noqueue state UNKNOWN group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
     inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever 2: ens192: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group
 default qlen 1000
     link/ether 00:50:56:9a:80:f7 brd ff:ff:ff:ff:ff:ff
     inet 10.34.88.182/24 brd 10.34.88.255 scope global dynamic ens192
    valid_lft 7071sec preferred_lft 7071sec
     inet 10.10.40.90/24 brd 10.10.40.255 scope global ens192:1
    valid_lft forever preferred_lft forever
     inet6 fe80::250:56ff:fe9a:80f7/64 scope link
    valid_lft forever preferred_lft forever 3: docker0: <NO-ARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
 group default
     link/ether 02:42:cf:ec:b3:ee brd ff:ff:ff:ff:ff:ff
     inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
    valid_lft forever preferred_lft forever 4: flannel.1: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
 group default
     link/ether 06:df:1e:87:b8:ee brd ff:ff:ff:ff:ff:ff
     inet 10.96.0.0/32 scope global flannel.1
    valid_lft forever preferred_lft forever
     inet6 fe80::4df:1eff:fe87:b8ee/64 scope link
    valid_lft forever preferred_lft forever 5: cni0: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
 group default qlen 1000
     link/ether 0a:58:0a:60:00:01 brd ff:ff:ff:ff:ff:ff
     inet 10.96.0.1/24 scope global cni0
    valid_lft forever preferred_lft forever
     inet6 fe80::8c77:39ff:fe6e:8710/64 scope link
    valid_lft forever preferred_lft forever 7: veth9527916b@if3: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0
 state UP group default
     link/ether 46:62:b6:b8:b9:ac brd ff:ff:ff:ff:ff:ff link-netnsid 1
     inet6 fe80::4462:b6ff:feb8:b9ac/64 scope link
    valid_lft forever preferred_lft forever 8: veth6e6f08f5@if3: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0
 state UP group default
     link/ether 3e:a5:4b:8d:11:ce brd ff:ff:ff:ff:ff:ff link-netnsid 2
     inet6 fe80::3ca5:4bff:fe8d:11ce/64 scope link
    valid_lft forever preferred_lft forever

I can ping the IP:

[root@K8sMaster01 ~]# ping 10.96.0.1
PING 10.96.0.1 (10.96.0.1) 56(84) bytes of data.
64 bytes from 10.96.0.1: icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from 10.96.0.1: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 10.96.0.1: icmp_seq=3 ttl=64 time=0.042 ms

and telent the port:

[root@K8sMaster01 ~]# telnet 10.96.0.1 443
Trying 10.96.0.1...
Connected to 10.96.0.1.
Escape character is '^]'.

Some one PLEASE save my back holiday weekend and tell me what is going wrong!

As requested here is my get services:

[root@K8sMaster01 ~]# kubectl get services --all-namespaces
NAMESPACE      NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
default        blackbox-database         ClusterIP   10.110.56.121    <none>        3306/TCP          5h
default        kube-ops-view             ClusterIP   10.105.35.23     <none>        82/TCP            1d
default        kube-ops-view-redis       ClusterIP   10.107.254.193   <none>        6379/TCP          1d
default        kubernetes                ClusterIP   10.96.0.1        <none>        443/TCP           1d
kube-system    heapster                  ClusterIP   10.103.5.79      <none>        80/TCP            1d
kube-system    kube-dns                  ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP     1d
kube-system    kubernetes-dashboard      ClusterIP   10.96.220.152    <none>        443/TCP           1d
kube-system    traefik-ingress-service   ClusterIP   10.102.84.167    <none>        80/TCP,8080/TCP   1d
liab-live-bb   blackbox-application      ClusterIP   10.98.40.25      <none>        8080/TCP          5h
liab-live-bb   blackbox-database         ClusterIP   10.108.43.196    <none>        3306/TCP          5h

Telnet to port 46690:

[root@K8sMaster01 ~]# telnet 10.96.0.7 46690
Trying 10.96.0.7...

(no response)

Today I tried deploying two of my applications to the cluster, as can be seen in the get services. The "app" is unable to connect to the "db" it cannot resolve the DB service name. I believe that I have an issue with the networking, not sure if it is at the host level, or with in the kubernetes layer. I did notice my resolv.conf files were not pointing to localhost, and found some changes to make to the coredns config. When Ilooked at its configuration it was trying to point to a IP V6 Address, so changed it to this:

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local 10.96.0.0/12 {
           pods insecure
        }
        prometheus :9153
        proxy 10.4.4.28
        cache 30
        reload
    }
kind: ConfigMap
metadata:
  creationTimestamp: 2018-08-27T12:28:57Z
  name: coredns
  namespace: kube-system
  resourceVersion: "174571"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: c5016361-a9f4-11e8-b0b4-0050569afad9
-- Lord Riley
flannel
kubeadm
kubectl
kubernetes

0 Answers