K
Q

Question

Looking for a way to determine why a packet leaving my cluster would keep it's pod IP

3/31/2020

I have a network issue on my cluster and at first I thought it was a routing issue but discovered that maybe the outgoing packet from the cluster isn't getting wrapped with the node ip when leaving the node.

Background is that I have two clusters. I set up the first one (months ago) manually using this guide and it worked great. Then the second one I built multiple times as I created/debugged anisble scripts to automate how I created the first cluster.

On cluster2 I have the network issue... I can get to pods on other nodes but can't get to anything on my regular network. I have tcpdump'd the physical interface on node0 in cluster2 when pinging from a busybox pod and the 172.16.0.x internal pod ip is visible at that interface as the source ip - and my network outside the node has no idea what to do with it. But on cluster1 this same test shows the node ip in place of the pod ip - which is how I assume it should work.

My question is how can I troubleshoot this? Any ideas would be great as I have been at this for several days now. Even if it seems obvious as I can no longer see the forest through the trees... ie. both clusters look the same everywhere I know how to check :)

caveat to "my clusters are the same": Cluster1 is running kubectl 1.16 cluster2 is running 1.18

----edit after @Matt dropped some kube-proxy knowledge on me----

Did not know that kube-proxy rules could just be read by iptables command! Awesome!

I think my problem is those 10.net addresses in the broke cluster. I don't even know where those came from as they are not in any of my ansible config scripts or kube init files... I use all 172's in my configs.

I do pull some configs direct from source (flannel and CSI/CPI stuff) I'll pull those down and inspect them to see if the 10's are in there... Hopefully it's in the flannel defaults or something and I can just change that yml file!

cluster1 working:

[root@k8s-master ~]# iptables -t nat -vnL| grep POSTROUTING -A5
Chain POSTROUTING (policy ACCEPT 22 packets, 1346 bytes)
 pkts bytes target     prot opt in     out     source               destination
6743K  550M KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
    0     0 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0
3383K  212M RETURN     all  --  *      *       172.16.0.0/16        172.16.0.0/16
 117K 9002K MASQUERADE  all  --  *      *       172.16.0.0/16       !224.0.0.0/4
    0     0 RETURN     all  --  *      *      !172.16.0.0/16        172.16.0.0/24
    0     0 MASQUERADE  all  --  *      *      !172.16.0.0/16        172.16.0.0/16

cluster2 - not working:

[root@testvm-master ~]# iptables -t nat -vnL | grep POSTROUTING -A5
Chain POSTROUTING (policy ACCEPT 1152 packets, 58573 bytes)
 pkts bytes target     prot opt in     out     source               destination
 719K   37M KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
    0     0 RETURN     all  --  *      *       10.244.0.0/16        10.244.0.0/16
    0     0 MASQUERADE  all  --  *      *       10.244.0.0/16       !224.0.0.0/4
 131K 7849K RETURN     all  --  *      *      !10.244.0.0/16        172.16.0.0/24
    0     0 MASQUERADE  all  --  *      *      !10.244.0.0/16        10.244.0.0/16

-- Levi Silvertab

kubernetes

networking

1 Answer

4/3/2020

Boom! @Matt advice for the win.

Using iptables to verify the nat rules that flannel was applying did the trick. I was able to find the 10.244 subnet in the flannel config that was referenced in the guide I was using.

I had two options. 1. download and alter the flannel yaml before deploying the CNI or 2. make my kubeadmin init subnet declaration match what flannel has.

I went with option 2 because I don't want to alter the flannel config everytime... I just want to pull down their latest file and run with it. This worked quite nicely to resolve my issue.

-- Levi Silvertab

Source: StackOverflow

KQ

Looking for a way to determine why a packet leaving my cluster would keep it's pod IP

Similar Questions

1 Answer

K
Q