I created a 2 node k8s cluster with kubeadm (1 master + 2 workers), on GCP, and everything seems to be fine, except the pod-to-pod communication.
So, first thing first, there are no visible issues in the cluster. All pods are running. No errors, no crushloopbackoffs, no pending pods.
I forced the following scenario for the tests:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default bb-9bd94cf6f-b5cj5 1/1 Running 1 19h 192.168.2.3 worker-node-1
default curler-7668c66bf5-6c6v8 1/1 Running 1 20h 192.168.2.2 worker-node-1
default curler-master-5b86858f9f-c6zhq 1/1 Running 0 18h 192.168.0.6 master-node
default nginx-5c7588df-x42vt 1/1 Running 0 19h 192.168.2.4 worker-node-1
default nginy-6d77947646-4r4rl 1/1 Running 0 20h 192.168.1.4 worker-node-2
kube-system calico-node-9v98k 2/2 Running 0 97m 10.240.0.7 master-node
kube-system calico-node-h2px8 2/2 Running 0 97m 10.240.0.9 worker-node-2
kube-system calico-node-qjn5t 2/2 Running 0 97m 10.240.0.8 worker-node-1
kube-system coredns-86c58d9df4-gckhl 1/1 Running 0 97m 192.168.1.9 worker-node-2
kube-system coredns-86c58d9df4-wvt2n 1/1 Running 0 97m 192.168.2.6 worker-node-1
kube-system etcd-master-node 1/1 Running 0 97m 10.240.0.7 master-node
kube-system kube-apiserver-master-node 1/1 Running 0 97m 10.240.0.7 master-node
kube-system kube-controller-manager-master-node 1/1 Running 0 97m 10.240.0.7 master-node
kube-system kube-proxy-2g85h 1/1 Running 0 97m 10.240.0.8 worker-node-1
kube-system kube-proxy-77pq4 1/1 Running 0 97m 10.240.0.9 worker-node-2
kube-system kube-proxy-bbd2d 1/1 Running 0 97m 10.240.0.7 master-node
kube-system kube-scheduler-master-node 1/1 Running 0 97m 10.240.0.7 master-node
And these are the services:
$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 21h
default nginx ClusterIP 10.109.136.120 <none> 80/TCP 20h
default nginy NodePort 10.101.111.222 <none> 80:30066/TCP 20h
kube-system calico-typha ClusterIP 10.111.238.0 <none> 5473/TCP 21h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 21h
nginx
and nginy
services are pointing to nginx-xxx
and nginy-xxx
pods, and are running nginx
, curlers
are pods with curl and ping. One of them is running on the master node, and the other one on worker-node-1. If I access the curler pod running on the worker-node-1 (curler-7668c66bf5-6c6v8), and curl
the nginx pod on the same node, it works fine.
$ kubectl exec -it curler-7668c66bf5-6c6v8 sh
/ # curl 192.168.2.4 -I
HTTP/1.1 200 OK
Server: nginx/1.15.12
Date: Tue, 07 May 2019 10:59:06 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 16 Apr 2019 13:08:19 GMT
Connection: keep-alive
ETag: "5cb5d3c3-264"
Accept-Ranges: bytes
If I try the same thing through the service name, it half works, as coredns
is running; one on the worker-node-1, and the other one on worker-node-2. I believe if the request goes to the coredns pod running on worker-node-1 it works, but when it goes to worker-node-2, it doesn't.
/ # curl nginx -I
curl: (6) Could not resolve host: nginx
/ # curl nginx -I
HTTP/1.1 200 OK
Server: nginx/1.15.12
Date: Tue, 07 May 2019 11:06:13 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 16 Apr 2019 13:08:19 GMT
Connection: keep-alive
ETag: "5cb5d3c3-264"
Accept-Ranges: bytes
So, definitely my pod-to-pod communication is not working. I checked the logs of the calico daemonset pods, but nothing suspicious. I do have some suspicious logs in kube-proxy
pods though:
$ kubectl logs kube-proxy-77pq4 -n kube-system
W0507 09:16:51.305357 1 server_others.go:295] Flag proxy-mode="" unknown, assuming iptables proxy
I0507 09:16:51.315528 1 server_others.go:148] Using iptables Proxier.
I0507 09:16:51.315775 1 server_others.go:178] Tearing down inactive rules.
E0507 09:16:51.356243 1 proxier.go:563] Error removing iptables rules in ipvs proxier: error deleting chain "KUBE-MARK-MASQ": exit status 1: iptables: Too many links.
I0507 09:16:51.648112 1 server.go:464] Version: v1.13.1
I0507 09:16:51.658690 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0507 09:16:51.659034 1 config.go:102] Starting endpoints config controller
I0507 09:16:51.659052 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I0507 09:16:51.659076 1 config.go:202] Starting service config controller
I0507 09:16:51.659083 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I0507 09:16:51.759278 1 controller_utils.go:1034] Caches are synced for endpoints config controller
I0507 09:16:51.759291 1 controller_utils.go:1034] Caches are synced for service config controller
Can anyone tell me if the issue could be due the kube-proxy
misconfiguration of the iptables? Or point out anything I am missing?
The issue was resolved by the original poster, with following solution:
The issue was that I had to open IP in IP communication in my firewall rules, in GCP. Now it works