Unable to communicate between pods running on different nodes in Kubernetes

7/2/2018

I have been building a distributed load testing application using Kubernetes and Locust (similar to this).

I currently have a multi-node cluster running on bare-metal (running on an Ubuntu 18.04 server, set up using Kubeadm, and with Flannel as my pod networking addon).

The architecture of my cluster is as follows:

  • I have a 'master instance' of the Locust application running on my master node.
  • I have 'slave instances' of the Locust application running on all of my other nodes. These slave instances must be able to bind to a port (5558 by default) of the master instance.

As of now, I don't believe that that is happening. My cluster shows that all of my deployments are healthy and running, however I am unable to access the logs of any of my slave instances which are running on nodes other than my master node. This leads me to believe that my pods are unable to communicate with each other across different nodes.

Is this an issue with my current networking or deployment setups (I followed the linked guides pretty-much verbatim)? Where should I start in debugging this issue?

-- whiletrue
docker
kubernetes
locust

3 Answers

7/3/2018

Based on your description of the problem I can guess that you have a connection problem caused by firewall or network misconfiguration.

From the network perspective, there are requirements mentioned in Kubernetes documentation:

  • all containers can communicate with all other containers without NAT
  • all nodes can communicate with all containers (and vice-versa) without NAT
  • the IP that a container sees itself as is the same IP that others see it as

From the firewall perspective, you need to ensure the cluster traffic can pass the firewall on the nodes.

Here is the list of ports you should have opened on the nodes provided by CoreOS website:

Master node inbound: TCP: 443  from Worker Nodes, API Requests, and End-Users
                     UDP: 8285,8472 from Master & Worker Nodes


Worker node inbound: TCP: 10250 from Master Nodes
                     TCP: 10255 from Heapster
                     TCP: 30000-32767 from External Application Consumers
                     TCP: 1-32767 from Master & Worker Nodes
                     TCP: 179 from Worker Nodes
                     UDP: 8472 from Master & Worker Nodes
                     UPD: 179 from Worker Nodes

Etcd node inbound:  TCP: 2379-2380 from Master & Worker Nodes
-- VAS
Source: StackOverflow

7/3/2018

How slaves instances try to join the master instance. You have to create master service (with labels) to access master pod. Also, make sure your SDN is up and master is reachable to slave instances. You can test using telnet to master pod IP from slave instances.

-- Akash Sharma
Source: StackOverflow

7/3/2018

see ip forwarding is enabled on all the nodes.

# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

if not enable it like this and test it.

echo 1 > /proc/sys/net/ipv4/ip_forward
-- sfgroups
Source: StackOverflow