Kubernetes service on windows node not reachable

12/5/2018

I'm currently working on a mixed Linux/Windows Kubernetes cluster. It's currently 4 nodes, running as VMs in a VMWare cluster on a single physical server:

  • 3 Linux nodes running on debian stretch and configured using kubeadm
  • 1 Windows Server 2019 (1809) node configured based on Microsoft's documentation.

I'm using flannel in host-gw mode for the networking as suggested by Microsoft. IPs are properly assigned to both pods and services in their respective ranges (10.244.0.0/16 for pods and 10.96.0.0/12 for services).

The whole thing is running with Kubernetes 1.13. upgraded from 1.12.3 and the flannel binaries freshly downloaded today as well from Microsoft/SDN.

The Windows Powershell command to start the services:

.\start.ps1 -ManagementIP 10.71.145.37 -ClusterCIDR 10.244.0.0/16 -ServiceCIDR 10.96.0.0/12 -KubeDnsServiceIP 10.96.0.10

What's working?

  • Linux pod -> Linux pod: yes
  • Linux pod -> Windows pod: yes
  • Windows pod -> Linux pod: yes
  • Windows pod -> Windows pod: yes
  • Linux pod -> Linux service: yes
  • Linux pod -> Windows service: no
  • Windows pod -> Linux service: no
  • Windows pod -> Windows service: no
  • Linux host -> Linux pod: yes
  • Linux host -> Windows pod: yes
  • Windows host -> Linux pod: yes
  • Windows host -> Windows pod: yes
  • Linux host -> Linux service: yes
  • Linux host -> Windows service: no
  • Windows host -> Linux service: no
  • Windows host -> Windows service: no

Long story short: Direct connections to pods work across Windows and Linux, service connections only work for Linux services (services backed by Linux pods) and only from Linux pods or hosts.

DNS resolution is also working, although I cannot resolve service.namespace on Windows pods, just hostname or FQDN work, nothing in between.

Routing tables from the Linux nodes:

# host linux-node-1: 10.71.144.71
root@linux-node-1:~# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.71.144.1     0.0.0.0         UG    0      0        0 ens32
10.71.144.0     0.0.0.0         255.255.252.0   U     0      0        0 ens32
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.1.0      linux-node-2    255.255.255.0   UG    0      0        0 ens32
10.244.2.0      linux-node-3    255.255.255.0   UG    0      0        0 ens32
10.244.5.0      windows-node-1  255.255.255.0   UG    0      0        0 ens32
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

# host linux-node-2: 10.71.147.15
root@linux-node-2:~# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.71.144.1     0.0.0.0         UG    0      0        0 ens32
10.71.144.0     0.0.0.0         255.255.252.0   U     0      0        0 ens32
10.244.0.0      linux-node-1    255.255.255.0   UG    0      0        0 ens32
10.244.1.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.2.0      linux-node-3    255.255.255.0   UG    0      0        0 ens32
10.244.5.0      windows-node-1  255.255.255.0   UG    0      0        0 ens32
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

# host linux-node-3: 10.71.144.123
root@linux-node-3:~# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.71.144.1     0.0.0.0         UG    0      0        0 ens32
10.71.144.0     0.0.0.0         255.255.252.0   U     0      0        0 ens32
10.244.0.0      linux-node-1    255.255.255.0   UG    0      0        0 ens32
10.244.1.0      linux-node-2    255.255.255.0   UG    0      0        0 ens32
10.244.2.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.5.0      windows-node-1  255.255.255.0   UG    0      0        0 ens32
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Routing table from the Windows node:

PS C:\k> route print
===========================================================================
Interface List
  9...00 50 56 89 69 ce ......Hyper-V Virtual Ethernet Adapter #2
 21...00 15 5d 8d 98 26 ......Hyper-V Virtual Ethernet Adapter #3
  1...........................Software Loopback Interface 1
 12...00 15 5d 84 c0 c9 ......Hyper-V Virtual Ethernet Adapter
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0      10.71.144.1     10.71.145.37     25
          0.0.0.0          0.0.0.0       10.244.5.1       10.244.5.2    281
      10.71.144.0    255.255.252.0         On-link      10.71.145.37    281
     10.71.145.37  255.255.255.255         On-link      10.71.145.37    281
     10.71.145.37  255.255.255.255      10.71.144.1     10.71.145.37    125
    10.71.147.255  255.255.255.255         On-link      10.71.145.37    281
       10.244.0.0    255.255.255.0     10.71.144.71     10.71.145.37    281
       10.244.1.0    255.255.255.0     10.71.147.15     10.71.145.37    281
       10.244.2.0    255.255.255.0    10.71.144.123     10.71.145.37    281
       10.244.5.0    255.255.255.0         On-link        10.244.5.2    281
       10.244.5.2  255.255.255.255         On-link        10.244.5.2    281
     10.244.5.255  255.255.255.255         On-link        10.244.5.2    281
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331
      172.27.80.0    255.255.240.0         On-link       172.27.80.1   5256
      172.27.80.1  255.255.255.255         On-link       172.27.80.1   5256
    172.27.95.255  255.255.255.255         On-link       172.27.80.1   5256
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331
        224.0.0.0        240.0.0.0         On-link       172.27.80.1   5256
        224.0.0.0        240.0.0.0         On-link      10.71.145.37    281
        224.0.0.0        240.0.0.0         On-link        10.244.5.2    281
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331
  255.255.255.255  255.255.255.255         On-link       172.27.80.1   5256
  255.255.255.255  255.255.255.255         On-link      10.71.145.37    281
  255.255.255.255  255.255.255.255         On-link        10.244.5.2    281
===========================================================================
Persistent Routes:
  Network Address          Netmask  Gateway Address  Metric
          0.0.0.0          0.0.0.0       10.244.5.1  Default
       10.244.0.0    255.255.255.0     10.71.144.71  Default
       10.244.1.0    255.255.255.0     10.71.147.15  Default
       10.244.2.0    255.255.255.0    10.71.144.123  Default
          0.0.0.0          0.0.0.0       10.244.5.2  Default
     10.71.145.37  255.255.255.255      10.71.144.1     100
===========================================================================

Traceroute from the Windows pod to kube-dns:

C:\>tracert -4 -d -h 10 10.96.0.10

Tracing route to 10.96.0.10 over a maximum of 10 hops
  2
  1     *        *        *     Request timed out.
  2     *        *        *     Request timed out.
  3     *        *        *     Request timed out.
  4     *        *        *     Request timed out.
  5     *        *        *     Request timed out.
  6     *        *        *     Request timed out.
  7     *        *        *     Request timed out.
  8     *        *        *     Request timed out.
  9     *        *        *     Request timed out.
 10     *        *        *     Request timed out.

Trace complete.

Traceroute from a Linux pod to kube-dns:

root@deb:/# traceroute -4 -n 10.96.0.10
traceroute to 10.96.0.10 (10.96.0.10), 30 hops max, 60 byte packets
 1  10.244.2.1  0.396 ms  0.336 ms  0.314 ms
 2  10.71.144.1  7.044 ms  9.939 ms  10.062 ms
 3  10.71.144.2  1.727 ms  1.917 ms 10.71.144.3  1.233 ms
 4  10.68.132.166  6.985 ms 10.68.132.162  7.934 ms  8.404 ms
 5  10.103.4.246  203.807 ms  203.405 ms  203.777 ms
 6  10.103.4.245  209.431 ms  209.348 ms  209.772 ms
 7  10.96.108.86  496.457 ms  502.957 ms  494.978 ms
 8  10.96.0.10  211.666 ms * *

Hop 1 is pod network address, hops 2 and 3 are the standard gateway (VRRP) of the Linux host, hop 7 is a switch in the physical network, hop 8 is the kube-dns service, the rest the hops (4-6) are probably Cisco routers in the physical network.

The fact that DNS queries are working and I can ping 10.96.0.1 (kubernetes services) and 10.96.0.10 (kube-dns) from the host lets me believe the routing is working, but I can't ping any other service address nor can I e.g. curl my ingress controller from the windows host.

Disabling the Windows firewall did not make a difference either.

I'm out of ideas on what else I can check here and googling around barely brings anything applicable.

-- pschichtel
docker
flannel
kubernetes
networking
windows

1 Answer

12/13/2018

Regarding Windows services failing: Can you post CollectLogs.ps1 output (https://raw.githubusercontent.com/Microsoft/SDN/master/Kubernetes/windows/debug/collectlogs.ps1) and your CNI config file? Can the Windows pods reach the external internet (e.g. curl -useb http://google.com?)

Also, there is a video recently presented at KubeCon that goes into detail on how to troubleshoot Kubernetes networking problems on Windows that you may find helpful: https://www.youtube.com/watch?v=tTZFoiLObX4&feature=youtu.be

Regarding resolution of service.namespace, unfortunately that is a difference in behavior today by design of the DNS resolver on Windows, whereby any name searches containing dots are treated authoritatively. This is also why the default CNI config file doesn't have the needed DNS suffices specified in the SearchList that isn't working today. This behavior won't change earlier than Windows Server, version 1903 release.

-- daschott
Source: StackOverflow