EKS Nodes behind an ELB is OutOfService

4/27/2019

Have an EKS Cluster that has an ELB along with 3 worker nodes attached to it. The application is running within the container on 30590. Have configured health check on the same port 30590. Kube-proxy is listening to this port. But the worker nodes are OutOfService behind the ELB.

  1. Disabled Source, destination check for the Worker nodes.
  2. diabled the rp_filter by "echo 0 | sudo tee /proc/sys/net/ipv4/conf/{all,eth0,eth1,eth2}/rp_filter"
  3. Output of 'sudo iptables -vL':
 pkts bytes target     prot opt in     out     source               destination         
13884  826K KUBE-EXTERNAL-SERVICES  all  --  any    any     anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
2545K 1268M KUBE-FIREWALL  all  --  any    any     anywhere             anywhere            

Chain FORWARD (policy ACCEPT 92 packets, 28670 bytes)
 pkts bytes target     prot opt in     out     source               destination         
1307K  409M KUBE-FORWARD  all  --  any    any     anywhere             anywhere             /* kubernetes forwarding rules */
1301K  409M DOCKER-USER  all  --  any    any     anywhere             anywhere            

Chain OUTPUT (policy ACCEPT 139 packets, 12822 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 349K   21M KUBE-SERVICES  all  --  any    any     anywhere             anywhere             ctstate NEW /* kubernetes service portals */
2443K  222M KUBE-FIREWALL  all  --  any    any     anywhere             anywhere            

Chain DOCKER (0 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (0 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 RETURN     all  --  any    any     anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (0 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 RETURN     all  --  any    any     anywhere             anywhere            

Chain DOCKER-USER (1 references)
 pkts bytes target     prot opt in     out     source               destination         
1301K  409M RETURN     all  --  any    any     anywhere             anywhere            

Chain KUBE-EXTERNAL-SERVICES (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-FIREWALL (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       all  --  any    any     anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    3   180 ACCEPT     all  --  any    any     anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000

Chain KUBE-SERVICES (1 references)
 pkts bytes target     prot opt in     out     source               destination
  1. Output of : sudo tcpdump -i eth0 port 30590
12:41:44.217236 IP ip-192-168-186-107.ec2.internal.22580 > ip-x-x-x-.ec2.internal.30590: Flags [S], seq 3790958206, win 29200, options [mss 1460,sackOK,TS val 10236779 ecr 0,nop,wscale 8], length 0
12:41:44.217834 IP ip-x-x-x-.ec2.internal.30590 > ip-192-168-186-107.ec2.internal.22580: Flags [R.], seq 0, ack 3790958207, win 0, length 0 

Looks like the EKS nodes are sending TCP RSTs to the ELB as this is why they are failing the ELB healthchecks. Can anyone help me in troubleshooting the issue?

-- Sandy
amazon-eks
amazon-elb
amazon-web-services
kubernetes
tcp

1 Answer

4/28/2019

Found the solution :) The issue was with the replicationcontroller.json file, that I had mentioned a wrong port to be exposed, & trying to connect on a different port.

-- Sandy
Source: StackOverflow