Connection failure between services on kubernetes worker nodes

9/17/2019

I have Nodejs services which run on AWS EKS cluster. The cluster has two worker nodes. service connects with another service.

Problem is, sometimes when I re-create the deployment, service fails to connect with another service, where it runs on another worker node.

EX: service 1 runs on worker node 1, service 2 runs on worker node 2. if service 1 runs as service1:3001(internal DNS) and when service 2 tries to connect that service, it fails

What I did to solve this time to time,

Solution 1: Update AWS Control-Plane

eksctl utils update-kube-proxy --name acceptance --approve
eksctl utils update-aws-node --name acceptance --approve
eksctl utils update-coredns --name acceptance --approve
eksctl update cluster --name acceptance --approve

Note: I did this once, when solution 2 and 3 did not work

Solution 2: Delete coredns pods and let start themself.

kubectl delete po coredns-workernode-1 coredns-workernode-2, 

** Note:** I only do this if solution 3 does not work

Solution 3: Restarted the service again after I re-create the deployment.

These are solutions that I did to solve this connection failure between services.

Note: This does not happen when services run on the same worker node.

Moreinfo:

worker node AMI: ami-0b7127e7a2a38802a
EC2 Type: t2.medium
Kubernetes verson : 1.13.10

Seems those are temporary solutions and seems CoreDNS doesn't work properly. No more troubleshooting ideas to solve this permanently.

-- Sarasa Gunawardhana
amazon-ec2
amazon-eks
amazon-web-services
aws-eks
kubernetes

0 Answers