Intermittent "connection refused" between services

2/5/2018

I'm running ~200 pods over 3 n1-standard-4 GKE instances. Traffic levels are low, so there's plenty of spare CPU and RAM on each machine. Frequently when services attempt to connect to one another the connection fails with "CONNECTION REFUSED". After a few retries the connections work OK.

If I look on the machines I see on two of them that netstat -i returns quite a few TX-DRP on the virtual interfaces.

I presume I'm running out of some kind of resource. Any ideas what this could be or how I can go about diagnosing/correcting it?

kubectl get po --namespace=kube-system -a NAME READY STATUS RESTARTS AGE event-exporter-v0.1.7-5c4d9556cf-ndvrp 0/2 Evicted 0 33d event-exporter-v0.1.7-5c4d9556cf-nr9z4 2/2 Running 0 19d fluentd-gcp-v2.0.9-4cfhb 2/2 Running 0 8d fluentd-gcp-v2.0.9-hwp99 2/2 Running 16 33d fluentd-gcp-v2.0.9-v9zg4 2/2 Running 16 33d heapster-v1.4.3-699fc4bd5b-btgfk 3/3 Running 1 8d kube-dns-778977457c-b97cw 3/3 Running 30 33d kube-dns-778977457c-gpnj2 3/3 Running 65 33d kube-dns-autoscaler-7db47cb9b7-w5mph 1/1 Running 8 33d kube-proxy-gke-cluster-1-default-pool-522e7bcf-8h06 1/1 Running 8 33d kube-proxy-gke-cluster-1-default-pool-522e7bcf-8p9w 1/1 Running 0 8d kube-proxy-gke-cluster-1-default-pool-522e7bcf-kr1m 1/1 Running 8 33d l7-default-backend-6497bcdb4d-zbvrn 1/1 Running 33 33d tiller-deploy-5b9d65c7f-drmsg 1/1 Running 0 19d tiller-deploy-5b9d65c7f-pdmp5 0/1 Evicted 0 31d

kubectl get deployment --namespace=kube-system NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE event-exporter-v0.1.7 1 1 1 1 33d heapster-v1.4.3 1 1 1 1 33d kube-dns 2 2 2 2 33d kube-dns-autoscaler 1 1 1 1 33d l7-default-backend 1 1 1 1 33d tiller-deploy 1 1 1 1 31d

-- Philip Pearl
google-kubernetes-engine
kubernetes

0 Answers