Kube-Proxy-Windows CrashLoopBackOff

5/3/2021

Installation Process

I am all new to Kubernetes and currently setting up a Kubernetes Cluster inside of Azure VMs. I want to deploy Windows containers, but in order to achieve this I need to add Windows worker nodes. I already deployed a Kubeadm cluster with 3 master nodes and one Linux worker node and those nodes work perfectly.

Once I add the Windows node all things go downward. Firstly I use Flannel as my CNI plugin and prepare the deamonset and control plane according to the Kubernetes documentation: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/

Then after the installation of the Flannel deamonset, I installed the proxy and Docker EE accordingly.

Used Software

Master Nodes

OS: Ubuntu 18.04 LTS
Container Runtime: Docker 20.10.5
Kubernetes version: 1.21.0
Flannel-image version: 0.14.0
Kube-proxy version: 1.21.0

Windows Worker Node

OS: Windows Server 2019 Datacenter Core
Container Runtime: Docker 20.10.4
Kubernetes version: 1.21.0
Flannel-image version: 0.13.0-nanoserver
Kube-proxy version: 1.21.0-nanoserver

Wanted Results:

I wanted to see a full cluster ready to use and with all the needed in the Running state.

Current Results:

After the installation I checked if the installation was successful:

azureuser@Kube-M-001:~$ kubectl get pods -o wide -n kube-system
NAME                                  READY   STATUS             RESTARTS   AGE    IP           NODE         NOMINATED NODE   READINESS GATES
coredns-558bd4d5db-8mshg              1/1     Running            0          178m   10.244.0.3   kube-m-001   <none>           <none>
coredns-558bd4d5db-xhsmn              1/1     Running            0          178m   10.244.0.2   kube-m-001   <none>           <none>
etcd-kube-m-001                       1/1     Running            0          178m   10.0.10.4    kube-m-001   <none>           <none>
etcd-kube-m-002                       1/1     Running            0          164m   10.0.10.5    kube-m-002   <none>           <none>
etcd-kube-m-003                       1/1     Running            0          162m   10.0.10.6    kube-m-003   <none>           <none>
kube-apiserver-kube-m-001             1/1     Running            0          178m   10.0.10.4    kube-m-001   <none>           <none>
kube-apiserver-kube-m-002             1/1     Running            1          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-apiserver-kube-m-003             1/1     Running            0          162m   10.0.10.6    kube-m-003   <none>           <none>
kube-controller-manager-kube-m-001    1/1     Running            1          178m   10.0.10.4    kube-m-001   <none>           <none>
kube-controller-manager-kube-m-002    1/1     Running            0          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-controller-manager-kube-m-003    1/1     Running            0          163m   10.0.10.6    kube-m-003   <none>           <none>
kube-flannel-ds-5lwzf                 1/1     Running            0          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-flannel-ds-6lvgp                 1/1     Running            0          129m   10.0.10.7    kube-w-001   <none>           <none>
kube-flannel-ds-dlmkt                 1/1     Running            0          163m   10.0.10.6    kube-m-003   <none>           <none>
kube-flannel-ds-h27r7                 1/1     Running            0          169m   10.0.10.4    kube-m-001   <none>           <none>
kube-flannel-ds-windows-amd64-hwbjc   1/1     Running            0          121m   10.0.64.4    kube-w-002   <none>           <none>
kube-proxy-4rkgk                      1/1     Running            0          178m   10.0.10.4    kube-m-001   <none>           <none>
kube-proxy-6g4sb                      1/1     Running            0          129m   10.0.10.7    kube-w-001   <none>           <none>
kube-proxy-tvm9g                      1/1     Running            0          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-proxy-windows-j7c27              0/1     CrashLoopBackOff   26         121m   10.244.4.2   kube-w-002   <none>           <none>
kube-proxy-wzjm7                      1/1     Running            0          163m   10.0.10.6    kube-m-003   <none>           <none>
kube-scheduler-kube-m-001             1/1     Running            1          178m   10.0.10.4    kube-m-001   <none>           <none>
kube-scheduler-kube-m-002             1/1     Running            0          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-scheduler-kube-m-003             1/1     Running            0          162m   10.0.10.6    kube-m-003   <none>           <none>

I checked the logs of the specific kube-proxy pod and got the following results:

azureuser@Kube-M-001:~$ kubectl logs -n kube-system kube-proxy-windows-j7c27 -p

    Directory: C:\host\var\lib\kube-proxy\var\run\secrets\kubernetes.io

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 12:08 PM                serviceaccount

    Directory: C:\host\k

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 12:24 PM                kube-proxy
Using CNI conf file: 10-flannel.conf
I0503 12:30:23.146002    2448 flags.go:59] FLAG: --add-dir-header="false"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --alsologtostderr="false"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --bind-address="0.0.0.0"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --bind-address-hard-fail="false"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --cleanup="false"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --cluster-cidr=""
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --config="/var/lib/kube-proxy/config.conf"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --config-sync-period="15m0s"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --conntrack-max-per-core="32768"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --conntrack-min="131072"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --conntrack-tcp-timeout-close-wait="1h0m0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --conntrack-tcp-timeout-established="24h0m0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --detect-local-mode=""
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --enable-dsr="false"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --feature-gates="WinOverlay=true"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --healthz-bind-address="0.0.0.0:10256"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --healthz-port="10256"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --help="false"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --hostname-override="kube-w-002"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --iptables-masquerade-bit="14"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --iptables-min-sync-period="1s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --iptables-sync-period="30s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-exclude-cidrs="[]"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-min-sync-period="0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-scheduler=""
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-strict-arp="false"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-sync-period="30s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-tcp-timeout="0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-tcpfin-timeout="0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-udp-timeout="0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --kube-api-burst="10"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --kube-api-qps="5"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --kubeconfig=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-backtrace-at=":0"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-dir=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-file=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-file-max-size="1800"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-flush-frequency="5s"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --logtostderr="true"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --masquerade-all="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --master=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --metrics-bind-address="127.0.0.1:10249"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --metrics-port="10249"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --network-name=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --nodeport-addresses="[]"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --one-output="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --oom-score-adj="-999"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --profiling="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --proxy-mode=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --proxy-port-range=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --show-hidden-metrics-for-version=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --skip-headers="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --skip-log-headers="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --source-vip=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --stderrthreshold="2"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --udp-timeout="250ms"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --v="6"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --version="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --vmodule=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --windows-service="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --write-config-to=""
I0503 12:30:23.197789    2448 feature_gate.go:243] feature gates: &{map[WinOverlay:true]}
I0503 12:30:23.197789    2448 feature_gate.go:243] feature gates: &{map[WinOverlay:true]}
I0503 12:30:23.200622    2448 loader.go:372] Config loaded from file:  /var/lib/kube-proxy/kubeconfig.conf
I0503 12:30:23.221725    2448 server_windows.go:107] Using Kernelspace Proxier.
I0503 12:30:23.221725    2448 server_windows.go:110] creating dualStackProxier for Windows kernel.
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 13"
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
I0503 12:30:23.224600    2448 proxier.go:562] "Cleaning up old HNS policy lists"
I0503 12:30:33.229568    2448 proxier.go:583] "Hns Network loaded" hnsNetworkInfo=&{name:flannel.4096 id:ae948621-bb34-486d-b31d-cf397757b7c1 networkType:Overlay remoteSubnets:[0xc0000b77c0 0xc0000b7840 0xc0000b78c0 0xc0000b7940]}
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 13"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 13"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
F0503 12:30:33.256757    2448 server.go:489] unable to create proxier: unable to create ipv4 proxier: Could not find host mac address for 0.0.0.0, hostname: kube-w-002, clusterCIDR : 10.244.0.0/16, nodeIP:0.0.0.0

But I think something already went wrong in the Flannel installation, because the logs of the Flannel pod give the following results:

PS C:\Users\azureuser> docker ps
CONTAINER ID   IMAGE                                          COMMAND                  CREATED       STATUS       PORTS     NAMES
0cfa1c0c7b6d   mcr.microsoft.com/oss/kubernetes/pause:1.4.1   "cmd /S /C pauseloop…"   2 hours ago   Up 2 hours             k8s_POD_kube-proxy-windows-j7c27_kube-system_df8fda84-cf94-4ca7-863a-9c9694f2b3ba_8
fb3ccc5e0cf7   sigwindowstools/flannel                        "pwsh -file /etc/kub…"   2 hours ago   Up 2 hours             k8s_kube-flannel_kube-flannel-ds-windows-amd64-hwbjc_kube-system_9f0aa635-200b-4902-93cc-1d1da7f49a5d_0
bc8e97427613   mcr.microsoft.com/oss/kubernetes/pause:1.4.1   "cmd /S /C pauseloop…"   2 hours ago   Up 2 hours             k8s_POD_kube-flannel-ds-windows-amd64-hwbjc_kube-system_9f0aa635-200b-4902-93cc-1d1da7f49a5d_0
PS C:\Users\azureuser> docker logs fb3ccc5e0cf7

    Directory: C:\host\etc\cni

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                net.d

    Directory: C:\host\etc

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                kube-flannel

    Directory: C:\host\opt\cni

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                bin

    Directory: C:\host\k

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                flannel

    Directory: C:\host\k\flannel\var\run\secrets\kubernetes.io

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                serviceaccount
Configuring CNI for docker
WARNING: The names of some imported commands from the module 'hns' include unapproved verbs that might make them less
discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose
parameter. For a list of approved verbs, type Get-Verb.
Invoke-HnsRequest : @{Error=An adapter was not found. ; ErrorCode=2151350278; Success=False}
At C:\k\flannel\hns.psm1:233 char:16
+ ...      return Invoke-HnsRequest -Method POST -Type networks -Data $Json ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Invoke-HNSRequest

FATA[2021-05-03T10:28:44Z] rpc error: code = Internal desc = could not create IP forward entry: The object already exists.
I0503 10:28:45.340006    5512 main.go:518] Determining IP address of default interface
I0503 10:28:47.695146    5512 main.go:531] Using interface with name Ethernet 2 and address 10.0.64.4
I0503 10:28:47.695146    5512 main.go:548] Defaulting external address to interface address (10.0.64.4)
I0503 10:28:47.767526    5512 kube.go:119] Waiting 10m0s for node controller to sync
I0503 10:28:47.769102    5512 kube.go:306] Starting kube subnet manager
I0503 10:28:48.769283    5512 kube.go:126] Node controller sync successful
I0503 10:28:48.769283    5512 main.go:246] Created subnet manager: Kubernetes Subnet Manager - kube-w-002
I0503 10:28:48.769283    5512 main.go:249] Installing signal handlers
I0503 10:28:48.769283    5512 main.go:390] Found network config - Backend type: vxlan
I0503 10:28:48.769283    5512 vxlan_windows.go:127] VXLAN config: Name=flannel.4096 MacPrefix=0E-2A VNI=4096 Port=4789 GBP=false DirectRouting=false
I0503 10:28:48.838521    5512 device_windows.go:115] Attempting to create HostComputeNetwork &{ flannel.4096 Overlay [] {[]} { [] [] []} [{Static [{10.244.4.0/24 [[123 34 84 121 112 101 34 58 34 86 83 73 68 34 44 34 83 101 116 116 105 110 103 115 34 58 123 34 73 115 111 108 97 116 105 111 110 73 100 34 58 52 48 57 54 125 125]] [{10.244.4.1 0.0.0.0/0 0}]}]}] 8 {2 0}}
E0503 10:28:49.279614    5512 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.0.64.4:50315-><PUBLIC-IP>:6443: wsarecv: An established connection was aborted by the software in your host machine.
E0503 10:28:49.323566    5512 reflector.go:304] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to watch *v1.Node: Get "https://kube-lb.eastus.cloudapp.azure.com:6443/api/v1/nodes?resourceVersion=6092&timeoutSeconds=582&watch=true": dial tcp: lookup kube-lb.eastus.cloudapp.azure.com: no such host
I0503 10:28:53.739453    5512 device_windows.go:123] Waiting to get ManagementIP from HostComputeNetwork flannel.4096
I0503 10:28:54.248878    5512 device_windows.go:134] Waiting to get net interface for HostComputeNetwork flannel.4096 (10.0.64.4)
I0503 10:28:54.758966    5512 device_windows.go:148] Created HostComputeNetwork flannel.4096
I0503 10:28:54.804770    5512 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0503 10:28:54.816024    5512 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0503 10:28:54.816024    5512 main.go:325] Running backend.
I0503 10:28:54.816024    5512 main.go:343] Waiting for all goroutines to exit
I0503 10:28:54.816024    5512 vxlan_network_windows.go:63] Watching for new subnet leases

Can anyone please help me? So I can use my Windows worker-node in the Kubernetes cluster.

Edit 1:

Solved the Flannel FATA-error, this problem was caused by Flannel not being able to identify the network adapter. So before I started Flannel I created the needed network manually:

#First download HNS
PS C:\Users\azureuser> curl.exe -LO https://raw.githubusercontent.com/microsoft/SDN/master/Kubernetes/windows/hns.psm1
ipmo ./hns.psm1

#Create the network
PS C:\Users\azureuser> New-HNSNetwork -Type Overlay -AddressPrefix "192.168.255.0/30" -Gateway "192.168.255.1" -Name "External" -AdapterName "Ethernet 2" -SubnetPolicies @(@{Type = "VSID"; VSID = 9999; });

After this you can join the windows-node to the cluster and Flannel will startup without a problem, but the Kube-proxy problem still remains.

-- Twan Veldhuis
cni
flannel
kube-proxy
kubectl
kubernetes

1 Answer

5/7/2021

Are you still having this error? I managed to fix this by downgrading windows kube-proxy to at least 1.20.0. There must be some missing config or bug for 1.21.0.

curl -L https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/kube-proxy.yml | sed 's/VERSION/v1.20.0/g' | kubectl apply -f -
-- StaleMartyr
Source: StackOverflow