I'm currently working on a mixed Linux/Windows Kubernetes cluster. It's currently 4 nodes, running as VMs in a VMWare cluster on a single physical server:
I'm using flannel in host-gw mode for the networking as suggested by Microsoft. IPs are properly assigned to both pods and services in their respective ranges (10.244.0.0/16 for pods and 10.96.0.0/12 for services).
The whole thing is running with Kubernetes 1.13. upgraded from 1.12.3 and the flannel binaries freshly downloaded today as well from Microsoft/SDN.
The Windows Powershell command to start the services:
.\start.ps1 -ManagementIP 10.71.145.37 -ClusterCIDR 10.244.0.0/16 -ServiceCIDR 10.96.0.0/12 -KubeDnsServiceIP 10.96.0.10
What's working?
Long story short: Direct connections to pods work across Windows and Linux, service connections only work for Linux services (services backed by Linux pods) and only from Linux pods or hosts.
DNS resolution is also working, although I cannot resolve service.namespace
on Windows pods, just hostname or FQDN work, nothing in between.
Routing tables from the Linux nodes:
# host linux-node-1: 10.71.144.71
root@linux-node-1:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.71.144.1 0.0.0.0 UG 0 0 0 ens32
10.71.144.0 0.0.0.0 255.255.252.0 U 0 0 0 ens32
10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.1.0 linux-node-2 255.255.255.0 UG 0 0 0 ens32
10.244.2.0 linux-node-3 255.255.255.0 UG 0 0 0 ens32
10.244.5.0 windows-node-1 255.255.255.0 UG 0 0 0 ens32
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
# host linux-node-2: 10.71.147.15
root@linux-node-2:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.71.144.1 0.0.0.0 UG 0 0 0 ens32
10.71.144.0 0.0.0.0 255.255.252.0 U 0 0 0 ens32
10.244.0.0 linux-node-1 255.255.255.0 UG 0 0 0 ens32
10.244.1.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.2.0 linux-node-3 255.255.255.0 UG 0 0 0 ens32
10.244.5.0 windows-node-1 255.255.255.0 UG 0 0 0 ens32
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
# host linux-node-3: 10.71.144.123
root@linux-node-3:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.71.144.1 0.0.0.0 UG 0 0 0 ens32
10.71.144.0 0.0.0.0 255.255.252.0 U 0 0 0 ens32
10.244.0.0 linux-node-1 255.255.255.0 UG 0 0 0 ens32
10.244.1.0 linux-node-2 255.255.255.0 UG 0 0 0 ens32
10.244.2.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.5.0 windows-node-1 255.255.255.0 UG 0 0 0 ens32
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
Routing table from the Windows node:
PS C:\k> route print
===========================================================================
Interface List
9...00 50 56 89 69 ce ......Hyper-V Virtual Ethernet Adapter #2
21...00 15 5d 8d 98 26 ......Hyper-V Virtual Ethernet Adapter #3
1...........................Software Loopback Interface 1
12...00 15 5d 84 c0 c9 ......Hyper-V Virtual Ethernet Adapter
===========================================================================
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 10.71.144.1 10.71.145.37 25
0.0.0.0 0.0.0.0 10.244.5.1 10.244.5.2 281
10.71.144.0 255.255.252.0 On-link 10.71.145.37 281
10.71.145.37 255.255.255.255 On-link 10.71.145.37 281
10.71.145.37 255.255.255.255 10.71.144.1 10.71.145.37 125
10.71.147.255 255.255.255.255 On-link 10.71.145.37 281
10.244.0.0 255.255.255.0 10.71.144.71 10.71.145.37 281
10.244.1.0 255.255.255.0 10.71.147.15 10.71.145.37 281
10.244.2.0 255.255.255.0 10.71.144.123 10.71.145.37 281
10.244.5.0 255.255.255.0 On-link 10.244.5.2 281
10.244.5.2 255.255.255.255 On-link 10.244.5.2 281
10.244.5.255 255.255.255.255 On-link 10.244.5.2 281
127.0.0.0 255.0.0.0 On-link 127.0.0.1 331
127.0.0.1 255.255.255.255 On-link 127.0.0.1 331
127.255.255.255 255.255.255.255 On-link 127.0.0.1 331
172.27.80.0 255.255.240.0 On-link 172.27.80.1 5256
172.27.80.1 255.255.255.255 On-link 172.27.80.1 5256
172.27.95.255 255.255.255.255 On-link 172.27.80.1 5256
224.0.0.0 240.0.0.0 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 172.27.80.1 5256
224.0.0.0 240.0.0.0 On-link 10.71.145.37 281
224.0.0.0 240.0.0.0 On-link 10.244.5.2 281
255.255.255.255 255.255.255.255 On-link 127.0.0.1 331
255.255.255.255 255.255.255.255 On-link 172.27.80.1 5256
255.255.255.255 255.255.255.255 On-link 10.71.145.37 281
255.255.255.255 255.255.255.255 On-link 10.244.5.2 281
===========================================================================
Persistent Routes:
Network Address Netmask Gateway Address Metric
0.0.0.0 0.0.0.0 10.244.5.1 Default
10.244.0.0 255.255.255.0 10.71.144.71 Default
10.244.1.0 255.255.255.0 10.71.147.15 Default
10.244.2.0 255.255.255.0 10.71.144.123 Default
0.0.0.0 0.0.0.0 10.244.5.2 Default
10.71.145.37 255.255.255.255 10.71.144.1 100
===========================================================================
Traceroute from the Windows pod to kube-dns:
C:\>tracert -4 -d -h 10 10.96.0.10
Tracing route to 10.96.0.10 over a maximum of 10 hops
2
1 * * * Request timed out.
2 * * * Request timed out.
3 * * * Request timed out.
4 * * * Request timed out.
5 * * * Request timed out.
6 * * * Request timed out.
7 * * * Request timed out.
8 * * * Request timed out.
9 * * * Request timed out.
10 * * * Request timed out.
Trace complete.
Traceroute from a Linux pod to kube-dns:
root@deb:/# traceroute -4 -n 10.96.0.10
traceroute to 10.96.0.10 (10.96.0.10), 30 hops max, 60 byte packets
1 10.244.2.1 0.396 ms 0.336 ms 0.314 ms
2 10.71.144.1 7.044 ms 9.939 ms 10.062 ms
3 10.71.144.2 1.727 ms 1.917 ms 10.71.144.3 1.233 ms
4 10.68.132.166 6.985 ms 10.68.132.162 7.934 ms 8.404 ms
5 10.103.4.246 203.807 ms 203.405 ms 203.777 ms
6 10.103.4.245 209.431 ms 209.348 ms 209.772 ms
7 10.96.108.86 496.457 ms 502.957 ms 494.978 ms
8 10.96.0.10 211.666 ms * *
Hop 1 is pod network address, hops 2 and 3 are the standard gateway (VRRP) of the Linux host, hop 7 is a switch in the physical network, hop 8 is the kube-dns service, the rest the hops (4-6) are probably Cisco routers in the physical network.
The fact that DNS queries are working and I can ping 10.96.0.1 (kubernetes services) and 10.96.0.10 (kube-dns) from the host lets me believe the routing is working, but I can't ping any other service address nor can I e.g. curl my ingress controller from the windows host.
Disabling the Windows firewall did not make a difference either.
I'm out of ideas on what else I can check here and googling around barely brings anything applicable.
Regarding Windows services failing: Can you post CollectLogs.ps1 output (https://raw.githubusercontent.com/Microsoft/SDN/master/Kubernetes/windows/debug/collectlogs.ps1) and your CNI config file? Can the Windows pods reach the external internet (e.g. curl -useb http://google.com?)
Also, there is a video recently presented at KubeCon that goes into detail on how to troubleshoot Kubernetes networking problems on Windows that you may find helpful: https://www.youtube.com/watch?v=tTZFoiLObX4&feature=youtu.be
Regarding resolution of service.namespace, unfortunately that is a difference in behavior today by design of the DNS resolver on Windows, whereby any name searches containing dots are treated authoritatively. This is also why the default CNI config file doesn't have the needed DNS suffices specified in the SearchList that isn't working today. This behavior won't change earlier than Windows Server, version 1903 release.