Name resolution from windows pods does not work on 1.18.1

4/10/2020

Have following setup below. Flannel is in VXLAN mode. Name resolution does not work from Windows nodes. Verified that following works:

  1. Windows POD -> external DNS server resolution
  2. Windows POD -> HTTPS connection to kubernetes API service IP
  3. Linux POD on master -> Name resolution against DNS service

Following does not work:

  1. Windows POD -> DNS query against DNS service
  2. Windows POD -> DNS query against IP of DNS pod
gregory@master1:~$ k get nodes
NAME         STATUS   ROLES    AGE    VERSION
master1      Ready    master   22h    v1.18.1
winworker1   Ready    <none>   15h    v1.18.1
winworker2   Ready    <none>   169m   v1.18.1

DNS repro

PS C:\> Test-NetConnection 10.96.0.10 -port 53
WARNING: TCP connect to (10.96.0.10 : 53) failed
ComputerName           : 10.96.0.10
RemoteAddress          : 10.96.0.10
RemotePort             : 53
InterfaceAlias         : vEthernet (62a92abe4497c380bae9dfdee71ae5069cd0bd1b66208f58016345b7a6d9fabe_flannel.4096)
SourceAddress          : 10.244.1.4
PingSucceeded          : False
PingReplyDetails (RTT) : 0 ms
TcpTestSucceeded       : False
PS C:\> Test-NetConnection 10.96.0.1 -port 443
ComputerName     : 10.96.0.1
RemoteAddress    : 10.96.0.1
RemotePort       : 443
InterfaceAlias   : vEthernet (62a92abe4497c380bae9dfdee71ae5069cd0bd1b66208f58016345b7a6d9fabe_flannel.4096)
SourceAddress    : 10.244.1.4
TcpTestSucceeded : True
PS C:\> Resolve-dnsname www.google.com -server 8.8.8.8
Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
www.google.com                                 AAAA   299   Answer     2607:f8b0:4004:811::2004
www.google.com                                 A      299   Answer     172.217.15.100
PS C:\> Resolve-dnsname www.google.com -server 10.96.0.10
Resolve-dnsname : www.google.com : This operation returned because the timeout period expired
-- Gregory Suvalian
flannel
kubernetes
windows-container
windows-server-container

1 Answer

4/11/2020

FYI. Kubernetes 1.18.1 has a bug for windows nodes which fail to create network called Host on reboot. (https://github.com/kubernetes-sigs/sig-windows-tools/issues/52). As a result communication is broken within flannel even if you recreate network manually with docker network create -d nat host. To make DNS resolution to work again you also need to restart Rancher wins service get-service rancher-wins | Restart-Service Complete solution untill this is fixed is to modify StartKubelet.ps1 file and add following to it on line 3

$netId = docker network ls -f name=host --format "{{ .ID }}"
if ($netId.Length -lt 1) {
    docker network create -d nat host
}
-- Gregory Suvalian
Source: StackOverflow