k3s - can't access from one pod to another if pods on different nodes
Update:
I've narrowed the issue down - it's pods that are on other master nodes that can't communicate with those on the original master
pods on rpi4-server1
- the original cluster - can communicate with pods on rpi-worker01
and rpi3-worker02
pods on rpi4-server2
are unable to communicate with the others
I'm trying to run a HighAvailability cluster with embedded DB and using flannel / vxlan
I'm trying to setup a project with 5 services in k3s
When all of the pods are contained on a single node, they work together fine.
As soon as I add other nodes into the system and pods are deployed to them, the links seem to break.
In troubleshooting I've exec'd into one of the pods and tried to curl another. When they are on the same node this works, if the second service is on another node it doesn't.
I'm sure this is something simple that I'm missing, but I can't work it out! Help appreciated.
Key details:
Two rpi4s as servers (High Availability) and two rpi3s as worker nodes
metallb as loadbalancer
Two services - blah-interface
and blah-svc
are configured as LoadBalancer
to allow external access. The others blah-server
, n34
and test-apis
as NodePort
to support debugging, but only really need internal access
Info on nodes, pods and services....
pi@rpi4-server1:~/Projects/test_demo_2020/test_kube_config/testchart/templates $ sudo kubectl get nodes --all-namespaces -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
rpi4-server1 Ready master 11h v1.17.0+k3s.1 192.168.0.140 <none> Raspbian GNU/Linux 10 (buster) 4.19.75-v7l+ docker://19.3.5
rpi-worker01 Ready,SchedulingDisabled <none> 10h v1.17.0+k3s.1 192.168.0.41 <none> Raspbian GNU/Linux 10 (buster) 4.19.66-v7+ containerd://1.3.0-k3s.5
rpi3-worker02 Ready,SchedulingDisabled <none> 10h v1.17.0+k3s.1 192.168.0.142 <none> Raspbian GNU/Linux 10 (buster) 4.19.75-v7+ containerd://1.3.0-k3s.5
rpi4-server2 Ready master 10h v1.17.0+k3s.1 192.168.0.143 <none> Raspbian GNU/Linux 10 (buster) 4.19.75-v7l+ docker://19.3.5
pi@rpi4-server1:~/Projects/test_demo_2020/test_kube_config/testchart/templates $ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system helm-install-traefik-l2z6l 0/1 Completed 2 11h 10.42.0.2 rpi4-server1 <none> <none>
test-demo n34-5c7b9475cb-zjlgl 1/1 Running 1 4h30m 10.42.0.32 rpi4-server1 <none> <none>
kube-system metrics-server-6d684c7b5-5wgf9 1/1 Running 3 11h 10.42.0.26 rpi4-server1 <none> <none>
metallb-system speaker-62rkm 0/1 Pending 0 99m <none> rpi-worker01 <none> <none>
metallb-system speaker-2shzq 0/1 Pending 0 99m <none> rpi3-worker02 <none> <none>
metallb-system speaker-2mcnt 1/1 Running 0 99m 192.168.0.143 rpi4-server2 <none> <none>
metallb-system speaker-v8j9g 1/1 Running 0 99m 192.168.0.140 rpi4-server1 <none> <none>
metallb-system controller-65895b47d4-pgcs6 1/1 Running 0 90m 10.42.0.49 rpi4-server1 <none> <none>
test-demo blah-server-858ccd7788-mnf67 1/1 Running 0 64m 10.42.0.50 rpi4-server1 <none> <none>
default nginx2-6f4f6f76fc-n2kbq 1/1 Running 0 22m 10.42.0.52 rpi4-server1 <none> <none>
test-demo blah-interface-587fc66bf9-qftv6 1/1 Running 0 22m 10.42.0.53 rpi4-server1 <none> <none>
test-demo blah-svc-6f8f68f46-gqcbw 1/1 Running 0 21m 10.42.0.54 rpi4-server1 <none> <none>
kube-system coredns-d798c9dd-hdwn5 1/1 Running 1 11h 10.42.0.27 rpi4-server1 <none> <none>
kube-system local-path-provisioner-58fb86bdfd-tjh7r 1/1 Running 31 11h 10.42.0.28 rpi4-server1 <none> <none>
kube-system traefik-6787cddb4b-tgq6j 1/1 Running 0 4h50m 10.42.1.23 rpi4-server2 <none> <none>
default testdemo2020-testchart-6f8d44b496-2hcfc 1/1 Running 1 6h31m 10.42.0.29 rpi4-server1 <none> <none>
test-demo test-apis-75bb68dcd7-d8rrp 1/1 Running 0 7m13s 10.42.1.29 rpi4-server2 <none> <none>
pi@rpi4-server1:~/Projects/test_demo_2020/test_kube_config/testchart/templates $ sudo kubectl get svc --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 11h <none>
kube-system kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 11h k8s-app=kube-dns
kube-system metrics-server ClusterIP 10.43.74.118 <none> 443/TCP 11h k8s-app=metrics-server
kube-system traefik-prometheus ClusterIP 10.43.78.135 <none> 9100/TCP 11h app=traefik,release=traefik
test-demo blah-server NodePort 10.43.224.128 <none> 5055:31211/TCP 10h io.kompose.service=blah-server
default testdemo2020-testchart ClusterIP 10.43.91.7 <none> 80/TCP 10h app.kubernetes.io/instance=testdemo2020,app.kubernetes.io/name=testchart
test-demo traf-dashboard NodePort 10.43.60.155 <none> 8080:30808/TCP 10h io.kompose.service=traf-dashboard
test-demo test-apis NodePort 10.43.248.59 <none> 8075:31423/TCP 7h11m io.kompose.service=test-apis
kube-system traefik LoadBalancer 10.43.168.18 192.168.0.240 80:30688/TCP,443:31263/TCP 11h app=traefik,release=traefik
default nginx2 LoadBalancer 10.43.249.123 192.168.0.241 80:30497/TCP 92m app=nginx2
test-demo n34 NodePort 10.43.171.206 <none> 7474:30474/TCP,7687:32051/TCP 72m io.kompose.service=n34
test-demo blah-interface LoadBalancer 10.43.149.158 192.168.0.242 80:30634/TCP 66m io.kompose.service=blah-interface
test-demo blah-svc LoadBalancer 10.43.19.242 192.168.0.243 5005:30005/TCP,5006:31904/TCP,5002:30685/TCP 51m io.kompose.service=blah-svc