GKE Inter cluster access to a service does not work consistently changing between nodes and podip based on cluster

7/17/2021

I have the following GKE regional clusters in the same region say us-east1 but each in a different network.

  • Cluster-A is running 1.18.18-gke.1100 k8s version (master and worker nodes)
  • Cluster-B is running 1.18.18-gke.1100 k8s version (master and worker nodes)
  • Cluster-C is running 1.18.18-gke.1100 k8s version master and 1.17-1100 worker nodes k8s version

I have deployed the same kubernetes services on Cluster-B and Cluster-C and exposed a NodePort. I have created VPC Peering between Cluster A and Cluster B, similarly between Cluster-A and Cluster-C.

NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)       
serviceb      NodePort    10.2xx.xxx.xx   <none>        8100:31000/TCP

NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)       
servicec      NodePort    10.3x.xxx.xx    <none>        8100:31000/TCP

There is a firewall that allows the node internal IP as well as the pod range IP of Cluster-A to reach the service port of Cluster-B <br> Similarly there is a firewall that allows the node internal IP as well as the pod range IP of Cluster-AA to reach the service port of Cluster-C

My solution requires one pod on Cluster-A to connect to the above mentioned services on Cluster-B and also the one on Cluster-C

I have an fqdn which maps myserviceb.x.com to the Internal Node IP of pod associated to serviceb. Similarly another fqdn that maps myservicec.x.com to the Internal Node IP of pod associated to the servicec

The connectivity was fine till a few weeks back (when cluster-A nodes were on k8s version 1.11), when I upgraded it to 1.18 and hence the service restarted, it is unable to reach the cluster-B. This could be purely coincidental but a variable I am considering.

So for my test, I ran a tcpdump on the node of Cluster-A, while executing a curl command from within the test pod - I am noticing that <br> curl https://myserviceb.x.com sends the request from the pod ip range (please note that I have also added this range into the firewall to accept connections).

Rebuilt URL to: https://myserviceb.x.com:31000/
   Trying 172.xx.x.20...
TCP_NODELAY set
connect to 172.xx.x.20 port 31000 failed: Connection timed out

curl https://myservicec.x.com sends the request from the NODE ip range and works

Any inputs would be appreciated. I do not have any network policy in any of these clusters. VPC-native traffic routing is disabled on all the three clusters.

-- Jay
google-kubernetes-engine
kubernetes
kubernetes-pod

0 Answers