I have a cluster with 4 nodes (3 raspi, 1 NUC) and have setup several different workloads. The cluster itself worked perfectly fine, so I doubt that it is a general problem with the configuration. After a reboot of all nodes the cluster came back up well and all pods are running without issues. Unfortunately, pods that are running on one of my nodes (NUC) are not reachable via ingress anymore. If I access them through kube-proxy, I can see that the pods itself run fine and the http services behave as exptected. I upgraded the NUC node from Ubuntu 20.10 from 21.04, which may be related to the issues, but is not confirmed.
When the same pods are scheduled to the other nodes everything works as expected. For pods on the NUC node, I see the following in the ingress-controller logs:
2021/08/09 09:17:28 [error] 1497#1497: *1027899 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.244.1.1, server: gitea.fritz.box, request: "GET / HTTP/2.0", upstream: "http://10.244.3.50:3000/", host: "gitea.fritz.box"
I can only assume that the problem is related to the cluster internal network and have compared iptables rules and the like, but have not found differences that seem relevant.
The NUC node is running on Ubuntu 21.04 with kube v1.21.1, the raspis run Ubuntu 20.04.2 LTS. The master node still runs v1.21.1 the two worker nodes already run v.1.22.0, which works fine.
I have found a thread that points out incompatibility between metallb and nftables (https://github.com/metallb/metallb/issues/451) and though it's a bit older, I already changed to xtables as suggested (update-alternatives --set iptables /usr/sbin/iptables-legacy ...) without success.
Currently I'm running out of ideas on where to look. Can anyone suggest possible issues?
Thanks in advance!
Updating flannel from 13.1-rc2 to 14.0 seems to have done the trick. Maybe some of the iptables rules were screwed and got revreated, maybe 14.0 is necessary to work with 21.04? Who knows... I'm back up running fine and happy :)