readiness probe fails with connection refused

5/26/2019

I am trying to setup K8S to work with two Windows Nodes (2019). Everything seems to be working well and the containers are working and accessible using k8s service. But, once I introduce configuration for readiness (or liveness) probes - all fails. The exact error is:

Readiness probe failed: Get http://10.244.1.28:80/test.txt: dial tcp 10.244.1.28:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

When I try the url from k8s master, it works well and I get 200. However I read that the kubelet is the one executing the probe and indeed when trying from the Windows Node - it cannot be reached (which seems weird because the container is running on that same node). Therefore I assume that the problem is related to some network configuration.

I have a HyperV with External network Virtual Switch configured. K8S is configured to use flannel overlay (vxlan) as instructed here: https://docs.microsoft.com/en-us/virtualization/windowscontainers/kubernetes/network-topologies.

Any idea how to troubleshoot and fix this?

UPDATE: providing the yaml:

apiVersion: v1
kind: Service
metadata:
  name: dummywebapplication
  labels:
    app: dummywebapplication
spec:
  ports:
    # the port that this service should serve on
  - port: 80
    targetPort: 80
  selector:
    app: dummywebapplication
  type: NodePort
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: dummywebapplication
  name: dummywebapplication
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: dummywebapplication
      name: dummywebapplication
    spec:
      containers:
      - name: dummywebapplication
        image: <my image>
        readinessProbe:
          httpGet:
            path: /test.txt
            port: 80
          initialDelaySeconds: 15
          periodSeconds: 30
          timeoutSeconds: 60
      nodeSelector:
        beta.kubernetes.io/os: windows

And one more update. In this doc (https://kubernetes.io/docs/setup/windows/intro-windows-in-kubernetes/) it is written:

My Windows node cannot access NodePort service

Local NodePort access from the node itself fails. This is a known limitation. NodePort access works from other nodes or external clients.

I don't know if this is related or not as I could not connect to the container from a different node as stated above. I also tried a service of LoadBalancer type but it didn't provide a different result.

-- ewolfman
kubernetes

1 Answer

5/28/2019

The network configuration assumption was correct. It seems that for 'overlay', by default, the kubelet on the node cannot reach the IP of the container. So it keeps returning timeouts and connection refused messages.

Possible workarounds:

  1. Insert an 'exception' into the ExceptionList 'OutBoundNAT' of C:\k\cni\config on the nodes. This is somewhat tricky if you start the node with start.ps1 because it overwrites this file everytime. I had to tweak 'Update-CNIConfig' function in c:\k\helper.psm1 to re-insert the exception similar to the 'l2bridge' in that file.
  2. Use 'l2bridge' configuration. Seems like 'overlay' is running in a more secured isolation, but l2bridge is not.
-- ewolfman
Source: StackOverflow