Error: forwarding ports: error upgrading connection: error dialing backend: - Azure Kubernetes Service

2/7/2019

We have upgraded our Kubernates Service cluster on Azure to latest version 1.12.4. After that we suddenly recognize that pods and nodes cannot communicate between anymore by private ip :

kubectl get pods -o wide -n kube-system -l component=kube-proxy
NAME               READY     STATUS    RESTARTS   AGE       IP           NODE
kube-proxy-bfhbw   1/1       Running   2          16h       10.0.4.4     aks-agentpool-16086733-1
kube-proxy-d7fj9   1/1       Running   2          16h       10.0.4.35    aks-agentpool-16086733-0
kube-proxy-j24th   1/1       Running   2          16h       10.0.4.97    aks-agentpool-16086733-3
kube-proxy-x7ffx   1/1       Running   2          16h       10.0.4.128   aks-agentpool-16086733-4

As you see the node aks-agentpool-16086733-0 has private IP 10.0.4.35 . When we try to check logs on pods which are on this node we got such error:

Get https://aks-agentpool-16086733-0:10250/containerLogs/emw-sit/nginx-sit-deploy-864b7d7588-bw966/nginx-sit?tailLines=5000×tamps=true: dial tcp 10.0.4.35:10250: i/o timeout

We got the Tiller ( Helm) on this node as well, and if try to connect to tiller we got such error from Client PC:

shmits-imac:~ andris.shmits01$ helm version Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"} Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 10.0.4.35:10250: i/o timeout

Does anybody have any idea why the pods and nodes lost connectivity by private IP ?

-- Andris Smits
azure
kubernetes

2 Answers

2/7/2019

issue could be with apiserver. did you check logs from apiserver pod?

can you run the below command inside cluster. do you 200 OK response? curl -k -v https://10.96.0.1/version

-- P Ekambaram
Source: StackOverflow

2/8/2019

So , after we scaled down the cluster from 4 nodes to 2 nodes problem disappeared. And after we again scaled up from 2 nodes to 4 everything started working fine

-- Andris Smits
Source: StackOverflow