My goal is to model a hybrid/heterogeneous Kubernetes cluster, where I have the following setup:
Running a Kubernetes cluster with three VMs locally on my laptop is no problem and works fine with both Weave Net. However, there are some communication problems (I guess), when modelling my Kubernetes cluster as depicted above.
As Kubernetes is designed to run on nodes, such that all nodes are located in the same network, I set up an OpenVPN server on AWS and connect with both my laptop and Raspberry Pi to it. I was hoping that this would be enough to run Kubernetes on a heterogeneous setup, when the slave nodes are in a different network. Of course, this was an incorrect assumption.
If I run the Kubernetes dashboard on a slave node and try to access it, I get a timeout. If I run it on the Master node, everything works as expected.
I set up the cluster on AWS with kubeadm init --apiserver-advertise-address= and used kubeadm join to connect with the nodes.
$ kubectl get pods --all-namespaces -o wide:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system etcd-ip-172-31-28-6 1/1 Running 0 5m 172.31.28.6 ip-172-31-28-6
kube-system kube-apiserver-ip-172-31-28-6 1/1 Running 0 5m 172.31.28.6 ip-172-31-28-6
kube-system kube-controller-manager-ip-172-31-28-6 1/1 Running 0 5m 172.31.28.6 ip-172-31-28-6
kube-system kube-dns-6f4fd4bdf-w6ctf 0/3 ContainerCreating 0 15h <none> osboxes
kube-system kube-proxy-2pl2f 1/1 Running 0 15h 172.31.28.6 ip-172-31-28-6
kube-system kube-proxy-7b89c 0/1 CrashLoopBackOff 15 15h 192.168.2.106 edge-1
kube-system kube-proxy-qg69g 1/1 Running 1 15h 10.0.2.15 osboxes
kube-system kube-scheduler-ip-172-31-28-6 1/1 Running 0 5m 172.31.28.6 ip-172-31-28-6
kube-system weave-net-pqxfp 1/2 CrashLoopBackOff 189 15h 172.31.28.6 ip-172-31-28-6
kube-system weave-net-thhzr 1/2 CrashLoopBackOff 12 36m 192.168.2.106 edge-1
kube-system weave-net-v69hj 2/2 Running 7 15h 10.0.2.15 osboxes
$ kubectl -n kube-system logs --v=7 kube-dns-6f4fd4bdf-w6ctf -c kubedns:
...
I0321 09:04:25.620580 23936 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/kube-dns-6f4fd4bdf-w6ctf/log?container=kubedns
I0321 09:04:25.620605 23936 round_trippers.go:421] Request Headers:
I0321 09:04:25.620611 23936 round_trippers.go:424] Accept: application/json, */*
I0321 09:04:25.620616 23936 round_trippers.go:424] User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:04:25.713821 23936 round_trippers.go:439] Response Status: 400 Bad Request in 93 milliseconds
I0321 09:04:25.714106 23936 helpers.go:201] server response object: [{
"metadata": {},
"status": "Failure",
"message": "container \"kubedns\" in pod \"kube-dns-6f4fd4bdf-w6ctf\" is waiting to start: ContainerCreating",
"reason": "BadRequest",
"code": 400
}]
F0321 09:04:25.714134 23936 helpers.go:119] Error from server (BadRequest): container "kubedns" in pod "kube-dns-6f4fd4bdf-w6ctf" is waiting to start: ContainerCreating
kubectl -n kube-system logs --v=7 kube-proxy-7b89c:
...
I0321 09:06:51.803852 24289 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/kube-proxy-7b89c/log
I0321 09:06:51.803879 24289 round_trippers.go:421] Request Headers:
I0321 09:06:51.803891 24289 round_trippers.go:424] User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:06:51.803900 24289 round_trippers.go:424] Accept: application/json, */*
I0321 09:08:59.110869 24289 round_trippers.go:439] Response Status: 500 Internal Server Error in 127306 milliseconds
I0321 09:08:59.111129 24289 helpers.go:201] server response object: [{
"metadata": {},
"status": "Failure",
"message": "Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out",
"code": 500
}]
F0321 09:08:59.111156 24289 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out
kubectl -n kube-system logs --v=7 weave-net-pqxfp -c weave:
...
I0321 09:12:08.047206 24847 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/weave-net-pqxfp/log?container=weave
I0321 09:12:08.047233 24847 round_trippers.go:421] Request Headers:
I0321 09:12:08.047335 24847 round_trippers.go:424] Accept: application/json, */*
I0321 09:12:08.047347 24847 round_trippers.go:424] User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:12:08.062494 24847 round_trippers.go:439] Response Status: 200 OK in 15 milliseconds
DEBU: 2018/03/21 09:11:26.847013 [kube-peers] Checking peer "fa:10:a4:97:7e:7b" against list &{[{6e:fd:f4:ef:1e:f5 osboxes}]}
Peer not in list; removing persisted data
INFO: 2018/03/21 09:11:26.880946 Command line options: map[expect-npc:true ipalloc-init:consensus=3 db-prefix:/weavedb/weave-net http-addr:127.0.0.1:6784 ipalloc-range:10.32.0.0/12 nickname:ip-172-31-28-6 host-root:/host name:fa:10:a4:97:7e:7b no-dns:true status-addr:0.0.0.0:6782 datapath:datapath docker-api: port:6783 conn-limit:30]
INFO: 2018/03/21 09:11:26.880995 weave 2.2.1
FATA: 2018/03/21 09:11:26.881117 Inconsistent bridge state detected. Please do 'weave reset' and try again
kubectl -n kube-system logs --v=7 weave-net-thhzr -c weave:
...
I0321 09:15:13.787905 25113 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/weave-net-thhzr/log?container=weave
I0321 09:15:13.787932 25113 round_trippers.go:421] Request Headers:
I0321 09:15:13.787938 25113 round_trippers.go:424] Accept: application/json, */*
I0321 09:15:13.787946 25113 round_trippers.go:424] User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:17:21.126863 25113 round_trippers.go:439] Response Status: 500 Internal Server Error in 127338 milliseconds
I0321 09:17:21.127140 25113 helpers.go:201] server response object: [{
"metadata": {},
"status": "Failure",
"message": "Get https://192.168.2.106:10250/containerLogs/kube-system/weave-net-thhzr/weave: dial tcp 192.168.2.106:10250: getsockopt: connection timed out",
"code": 500
}]
F0321 09:17:21.127167 25113 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/weave-net-thhzr/weave: dial tcp 192.168.2.106:10250: getsockopt: connection timed out
$ ifconfig (Kubernetes master on AWS):
datapath Link encap:Ethernet HWaddr ae:90:9a:b2:7e:d9
inet6 addr: fe80::ac90:9aff:feb2:7ed9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1376 Metric:1
RX packets:29 errors:0 dropped:0 overruns:0 frame:0
TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:1904 (1.9 KB) TX bytes:1188 (1.1 KB)
docker0 Link encap:Ethernet HWaddr 02:42:50:39:1f:c7
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
eth0 Link encap:Ethernet HWaddr 06:a3:d0:8e:19:72
inet addr:172.31.28.6 Bcast:172.31.31.255 Mask:255.255.240.0
inet6 addr: fe80::4a3:d0ff:fe8e:1972/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9001 Metric:1
RX packets:10323322 errors:0 dropped:0 overruns:0 frame:0
TX packets:9418208 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3652314289 (3.6 GB) TX bytes:3117288442 (3.1 GB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:11388236 errors:0 dropped:0 overruns:0 frame:0
TX packets:11388236 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:2687297929 (2.6 GB) TX bytes:2687297929 (2.6 GB)
tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.8.0.1 P-t-P:10.8.0.2 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:97222 errors:0 dropped:0 overruns:0 frame:0
TX packets:164607 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:13381022 (13.3 MB) TX bytes:209129403 (209.1 MB)
vethwe-bridge Link encap:Ethernet HWaddr 12:59:54:73:0f:91
inet6 addr: fe80::1059:54ff:fe73:f91/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1376 Metric:1
RX packets:18 errors:0 dropped:0 overruns:0 frame:0
TX packets:36 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1476 (1.4 KB) TX bytes:2940 (2.9 KB)
vethwe-datapath Link encap:Ethernet HWaddr 8e:75:1c:92:93:0d
inet6 addr: fe80::8c75:1cff:fe92:930d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1376 Metric:1
RX packets:36 errors:0 dropped:0 overruns:0 frame:0
TX packets:18 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2940 (2.9 KB) TX bytes:1476 (1.4 KB)
vxlan-6784 Link encap:Ethernet HWaddr a6:02:da:5e:d5:2a
inet6 addr: fe80::a402:daff:fe5e:d52a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65485 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:8 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
$ sudo systemctl status kubelet.service (on AWS):
Mar 21 09:34:59 ip-172-31-28-6 kubelet[19676]: W0321 09:34:59.202058 19676 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 21 09:34:59 ip-172-31-28-6 kubelet[19676]: E0321 09:34:59.202452 19676 kubelet.go:2109] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.535541 19676 kuberuntime_manager.go:514] Container {Name:weave Image:weaveworks/weave-kube:2.2.1 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:HOSTNAME Value: ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath: MountPropagation:<nil>} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath: MountPropagation:<nil>} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath: MountPropagation:<nil>} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath: MountPropagation:<nil>} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:weave-net-token-vn8rh ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.536504 19676 kuberuntime_manager.go:758] checking backoff for container "weave" in pod "weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.536636 19676 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=weave pod=weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: E0321 09:35:01.536664 19676 pod_workers.go:186] Error syncing pod c6450070-2c61-11e8-a50d-06a3d08e1972 ("weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"
$ sudo systemctl status kubelet.service (on Laptop)
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.662670 715 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.663412 715 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.663869 715 kuberuntime_manager.go:647] createPodSandbox for pod "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.664295 715 pod_workers.go:186] Error syncing pod 11886465-2c61-11e8-a50d-06a3d08e1972 ("kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "CreatePodSandbox" for "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Mar 21 05:47:20 osboxes kubelet[715]: W0321 05:47:20.536161 715 pod_container_deletor.go:77] Container "bbf490835face43b70c24dbcb67c3f75872e7831b5e2605dc8bb71210910e273" not found in pod's containers
$ sudo systemctl status kubelet.service (on Raspberry Pi):
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.188199 339 kuberuntime_manager.go:514] Container {Name:kube-proxy Image:gcr.io/google_containers/kube-proxy-amd64:v1.9.5 Command:[/usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:kube-proxy ReadOnly:false MountPath:/var/lib/kube-proxy SubPath: MountPropagation:<nil>} {Name:xtables-lock ReadOnly:false MountPath:/run/xtables.lock SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:true MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:kube-proxy-token-px7dt ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.189023 339 kuberuntime_manager.go:758] checking backoff for container "kube-proxy" in pod "kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.190174 339 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)
Mar 21 09:29:01 edge-1 kubelet[339]: E0321 09:29:01.190518 339 pod_workers.go:186] Error syncing pod 5bebafa1-2c61-11e8-a50d-06a3d08e1972 ("kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:29:02 edge-1 kubelet[339]: W0321 09:29:02.278342 339 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 21 09:29:02 edge-1 kubelet[339]: E0321 09:29:02.282534 339 kubelet.go:2120] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
FATA: 2018/03/21 09:11:26.881117 Inconsistent bridge state detected. Please do 'weave reset' and try again
Since it's slightly complicated to run the weave
command on a Kubernetes node, just reboot the node and the bridge should be recreated from scratch.
F0321 09:08:59.111156 24289 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out
Consider whether those hosts can reach each other on their regular network.
You definitely have a problem with networking between Kubernetes master and nodes.
But, first of all, that is not a best idea to create that kind of hybrid installation. You must have a stable networking between master(s) and nodes, or it will cause many problems. But that is a hard to achieve using Internet connection.
If you want to prepare a Hybrid installation you can use Federation between Kubernetes cluster in AWS and on your local hardware.
But, regarded to your problem, I see that you have a problem with Weave net on a Master and on a edge-1
node. It is not clear from logs which kind of problem you have, try to run Weave container with WEAVE_DEBUG=1
environment variable. Without networking other pods like kube-dns
will not work properly.
Also, how did you setup OpenVPN. You must have routing between subnet on AWS and client-to-client. So, all addresses which you using for setup your cluster on all nodes has to be routed between each other. Check another one time to which address you bind Kubernetes components and Weave and are that addresses routable.