Kubernetes using KVM instances on OpenStack via KubeAdm

11/21/2018

I have successfully deployed a "working" Kubernetes cluster using the Horizon interface to create the Linux instances:

enter image description here

Having configured the hosts according to: https://kubernetes.io/docs/setup/independent/high-availability/

I can now say I have a Kubernetes cluster:

$ kubectl get nodes
NAME               STATUS    ROLES     AGE       VERSION
kube-apiserver-1   Ready     master    1d        v1.12.2
kube-apiserver-2   Ready     master    1d        v1.12.2
kube-apiserver-3   Ready     master    1d        v1.12.2
kube-node-1        Ready     <none>    21h       v1.12.2
kube-node-2        Ready     <none>    21h       v1.12.2
kube-node-3        Ready     <none>    21h       v1.12.2
kube-node-4        Ready     <none>    21h       v1.12.2

However, getting beyond this point has proven to be quite a struggle. I can not create usable services and coredns which is an essential component seems unusable:

$ kubectl -n kube-system get pods
NAME                                       READY     STATUS             RESTARTS   AGE
coredns-576cbf47c7-4gdnc                   0/1       CrashLoopBackOff   288        23h
coredns-576cbf47c7-x4h4v                   0/1       CrashLoopBackOff   288        23h
kube-apiserver-kube-apiserver-1            1/1       Running            0          1d
kube-apiserver-kube-apiserver-2            1/1       Running            0          1d
kube-apiserver-kube-apiserver-3            1/1       Running            0          1d
kube-controller-manager-kube-apiserver-1   1/1       Running            3          1d
kube-controller-manager-kube-apiserver-2   1/1       Running            1          1d
kube-controller-manager-kube-apiserver-3   1/1       Running            0          1d
kube-flannel-ds-amd64-2zdtd                1/1       Running            0          20h
kube-flannel-ds-amd64-7l5mr                1/1       Running            0          20h
kube-flannel-ds-amd64-bmvs9                1/1       Running            0          1d
kube-flannel-ds-amd64-cmhkg                1/1       Running            0          1d
...

Errors in the pod indicate that it cannot reach the kubernetes service:

$ kubectl -n kube-system logs coredns-576cbf47c7-4gdnc
E1121 18:04:48.928055       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:04:48.928688       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:04:48.928917       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.929869       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.930819       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.931517       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.932159       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.932722       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.933179       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
2018/11/21 18:06:07 [INFO] SIGTERM: Shutting down servers then terminating
E1121 18:06:21.933058       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:06:21.934010       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:06:21.935107       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

$ kubectl -n kube-system describe pod/coredns-576cbf47c7-dk7sh

...
Events:
  Type     Reason     Age                From                  Message
  ----     ------     ----               ----                  -------
  Normal   Scheduled  25m                default-scheduler     Successfully assigned kube-system/coredns-576cbf47c7-dk7sh to kube-node-3
  Normal   Pulling    25m                kubelet, kube-node-3  pulling image "k8s.gcr.io/coredns:1.2.2"
  Normal   Pulled     25m                kubelet, kube-node-3  Successfully pulled image "k8s.gcr.io/coredns:1.2.2"
  Normal   Created    20m (x3 over 25m)  kubelet, kube-node-3  Created container
  Normal   Killing    20m (x2 over 22m)  kubelet, kube-node-3  Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     20m (x2 over 22m)  kubelet, kube-node-3  Container image "k8s.gcr.io/coredns:1.2.2" already present on machine
  Normal   Started    20m (x3 over 25m)  kubelet, kube-node-3  Started container
  Warning  Unhealthy  4m (x36 over 24m)  kubelet, kube-node-3  Liveness probe failed: HTTP probe failed with statuscode: 503
  Warning  BackOff    17s (x22 over 8m)  kubelet, kube-node-3  Back-off restarting failed container

The kubernetes service is there and seems to be properly autoconfigured:

$ kubectl get svc

NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   23h

$ kubectl describe svc/kubernetes

Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.96.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         192.168.5.19:6443,192.168.5.24:6443,192.168.5.29:6443
Session Affinity:  None
Events:            <none>

$ kubectl get endpoints

NAME         ENDPOINTS                                               AGE
kubernetes   192.168.5.19:6443,192.168.5.24:6443,192.168.5.29:6443   23h

I have a nagging suspicion that I am missing something in the network layer and that this issue has something to do with Neutron. There are plenty of HOWTOs on how to install Kubernetes using other tools and how to install it in OpenStack but I have yet to find one guide that explains how to install it by creating KVMs using the Horizon interface and dealing with security groups and network issues. By the way, ALL IPv4/TCP ports are open between the Masters and Nodes.

Is there anyone out there with a guide that explains this scenario?

-- Daniel Maldonado
kubeadm
kubernetes
openstack-horizon
openstack-neutron

1 Answer

11/22/2018

The issue here was a polluted etcd cluster. As soon as I rebuilt the EXTERNAL etcd cluster and started from scratch using these instructions: https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd all items were working as expected. There does not seem to be a tool available to reset the etcd entries for a flannel pod network.

-- Daniel Maldonado
Source: StackOverflow