iptables rules for kube-dns

6/29/2021

I looked into the iptables rules which are used by the kube-dns, I'm a little bit confused by the sub chain "KUBE-SEP-V7KWRXXOBQHQVWAT", the content of this sub chain is below:<br> My question is why we need the target "KUBE-MARK-MASQ" when the source IP address(172.168.1.5) is the kube-dns IP address. Per my understanding, the target IP address should be the kube-dns pod's address 172.168.1.5, not the source IP address. Cause all the DNS queries are from other addesses(serivces), The DNS queries cannot be originated from itself.

# iptables -t nat -L KUBE-SEP-V7KWRXXOBQHQVWAT
Chain KUBE-SEP-V7KWRXXOBQHQVWAT (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  172.18.1.5           anywhere             /* kube-system/kube-dns:dns-tcp */
DNAT       tcp  --  anywhere             anywhere             /* kube-system/kube-dns:dns-tcp */ tcp to:172.18.1.5:53

Here is the full chain information:

# iptables -t nat -L KUBE-SERVICES
Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-MARK-MASQ  tcp  -- !172.18.1.0/24        10.0.62.222          /* kube-system/metrics-server cluster IP */ tcp dpt:https
KUBE-SVC-QMWWTXBG7KFJQKLO  tcp  --  anywhere             10.0.62.222          /* kube-system/metrics-server cluster IP */ tcp dpt:https
KUBE-MARK-MASQ  tcp  -- !172.18.1.0/24        10.0.213.2           /* kube-system/healthmodel-replicaset-service cluster IP */ tcp dpt:25227
KUBE-SVC-WT3SFWJ44Q74XUPR  tcp  --  anywhere             10.0.213.2           /* kube-system/healthmodel-replicaset-service cluster IP */ tcp dpt:25227
KUBE-MARK-MASQ  tcp  -- !172.18.1.0/24        10.0.0.1             /* default/kubernetes:https cluster IP */ tcp dpt:https
KUBE-SVC-NPX46M4PTMTKRN6Y  tcp  --  anywhere             10.0.0.1             /* default/kubernetes:https cluster IP */ tcp dpt:https
KUBE-MARK-MASQ  udp  -- !172.18.1.0/24        10.0.0.10            /* kube-system/kube-dns:dns cluster IP */ udp dpt:domain
KUBE-SVC-TCOU7JCQXEZGVUNU  udp  --  anywhere             10.0.0.10            /* kube-system/kube-dns:dns cluster IP */ udp dpt:domain
KUBE-MARK-MASQ  tcp  -- !172.18.1.0/24        10.0.0.10            /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domain
KUBE-SVC-ERIFXISQEP7F7OF4  tcp  --  anywhere             10.0.0.10            /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domain
KUBE-NODEPORTS  all  --  anywhere             anywhere             /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
# iptables -t nat -L KUBE-SVC-ERIFXISQEP7F7OF4
Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
target     prot opt source               destination
KUBE-SEP-V7KWRXXOBQHQVWAT  all  --  anywhere             anywhere             /* kube-system/kube-dns:dns-tcp */ statistic mode random probability 0.50000000000
KUBE-SEP-BWCLCJLZ5KI6FXBW  all  --  anywhere             anywhere             /* kube-system/kube-dns:dns-tcp */
# iptables -t nat -L KUBE-SEP-V7KWRXXOBQHQVWAT
Chain KUBE-SEP-V7KWRXXOBQHQVWAT (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  172.18.1.5           anywhere             /* kube-system/kube-dns:dns-tcp */
DNAT       tcp  --  anywhere             anywhere             /* kube-system/kube-dns:dns-tcp */ tcp to:172.18.1.5:53
-- jianrui
iptables
kubernetes

1 Answer

6/30/2021

You can think of kubernetes service routing in iptables as the following steps: 1. Loop through chain holding all kubernetes services 1. If you hit a matching service address and IP, go to service chain 2. The service chain will randomly select an endpoint from the list of endpoints (using probabilities) 3. If the endpoint selected has the same IP as the source address of the traffic, mark it for MASQUERADE later (this is the KUBE-MARK-MASQ you are asking about). In other words, if a pod tries to talk to a service IP and that service IP "resolves" to the pod itself, we need to mark it for MASQUERADE later (actual MASQUERADE target is in the POSTROUTING chain because it's only allowed to happen there) 4. Do the DNAT to selected endpoint and port. This happens regardless of whether 3) occurs or not.

If you look at iptables -t nat -L POSTROUTING there will be a rule that is looking for marked packets which is where the MASQUERADE actually happens.

The reason why the KUBE-MARK-MASQ rule has to exist is for hairpin NAT. The details why are a somewhat involved explanation, but here's my best attempt:

If MASQUERADE didn't happen, traffic would leave the pod's network namespace as (pod IP, source port -> virtual IP, virtual port) and then be NAT'd to (pod IP, source port-> pod IP, service port) and immediately sent back to the pod. Thus, this traffic would then arrive at the service with the source being (pod IP, source port). So when this service replies it will be replying to (pod IP, source port), but the pod (the kernel, really) is expecting traffic to come back on the same IP and port it sent the traffic to originally, which is (virtual IP, virtual port) and thus the traffic would get dropped on the way back.

-- maxstr
Source: StackOverflow