I have successfully deployed a "working" Kubernetes cluster using the Horizon interface to create the Linux instances:
Having configured the hosts according to: https://kubernetes.io/docs/setup/independent/high-availability/
I can now say I have a Kubernetes cluster:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-apiserver-1 Ready master 1d v1.12.2
kube-apiserver-2 Ready master 1d v1.12.2
kube-apiserver-3 Ready master 1d v1.12.2
kube-node-1 Ready <none> 21h v1.12.2
kube-node-2 Ready <none> 21h v1.12.2
kube-node-3 Ready <none> 21h v1.12.2
kube-node-4 Ready <none> 21h v1.12.2
However, getting beyond this point has proven to be quite a struggle. I can not create usable services and coredns which is an essential component seems unusable:
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-4gdnc 0/1 CrashLoopBackOff 288 23h
coredns-576cbf47c7-x4h4v 0/1 CrashLoopBackOff 288 23h
kube-apiserver-kube-apiserver-1 1/1 Running 0 1d
kube-apiserver-kube-apiserver-2 1/1 Running 0 1d
kube-apiserver-kube-apiserver-3 1/1 Running 0 1d
kube-controller-manager-kube-apiserver-1 1/1 Running 3 1d
kube-controller-manager-kube-apiserver-2 1/1 Running 1 1d
kube-controller-manager-kube-apiserver-3 1/1 Running 0 1d
kube-flannel-ds-amd64-2zdtd 1/1 Running 0 20h
kube-flannel-ds-amd64-7l5mr 1/1 Running 0 20h
kube-flannel-ds-amd64-bmvs9 1/1 Running 0 1d
kube-flannel-ds-amd64-cmhkg 1/1 Running 0 1d
...
Errors in the pod indicate that it cannot reach the kubernetes service:
$ kubectl -n kube-system logs coredns-576cbf47c7-4gdnc
E1121 18:04:48.928055 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:04:48.928688 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:04:48.928917 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.929869 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.930819 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.931517 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.932159 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.932722 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.933179 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
2018/11/21 18:06:07 [INFO] SIGTERM: Shutting down servers then terminating
E1121 18:06:21.933058 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:06:21.934010 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:06:21.935107 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
$ kubectl -n kube-system describe pod/coredns-576cbf47c7-dk7sh
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned kube-system/coredns-576cbf47c7-dk7sh to kube-node-3
Normal Pulling 25m kubelet, kube-node-3 pulling image "k8s.gcr.io/coredns:1.2.2"
Normal Pulled 25m kubelet, kube-node-3 Successfully pulled image "k8s.gcr.io/coredns:1.2.2"
Normal Created 20m (x3 over 25m) kubelet, kube-node-3 Created container
Normal Killing 20m (x2 over 22m) kubelet, kube-node-3 Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 20m (x2 over 22m) kubelet, kube-node-3 Container image "k8s.gcr.io/coredns:1.2.2" already present on machine
Normal Started 20m (x3 over 25m) kubelet, kube-node-3 Started container
Warning Unhealthy 4m (x36 over 24m) kubelet, kube-node-3 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 17s (x22 over 8m) kubelet, kube-node-3 Back-off restarting failed container
The kubernetes service is there and seems to be properly autoconfigured:
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h
$ kubectl describe svc/kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 192.168.5.19:6443,192.168.5.24:6443,192.168.5.29:6443
Session Affinity: None
Events: <none>
$ kubectl get endpoints
NAME ENDPOINTS AGE
kubernetes 192.168.5.19:6443,192.168.5.24:6443,192.168.5.29:6443 23h
I have a nagging suspicion that I am missing something in the network layer and that this issue has something to do with Neutron. There are plenty of HOWTOs on how to install Kubernetes using other tools and how to install it in OpenStack but I have yet to find one guide that explains how to install it by creating KVMs using the Horizon interface and dealing with security groups and network issues. By the way, ALL IPv4/TCP ports are open between the Masters and Nodes.
Is there anyone out there with a guide that explains this scenario?
The issue here was a polluted etcd cluster. As soon as I rebuilt the EXTERNAL etcd cluster and started from scratch using these instructions: https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd all items were working as expected. There does not seem to be a tool available to reset the etcd entries for a flannel pod network.