Kubernetes - Can't connect to a service IP from the service's pod

1/22/2016

I'm trying to create 3 instances of Kafka and deploy it a local Kubernetes setup. Because each instance needs some specific configuration, I'm creating one RC and one service for each - eagerly waiting for #18016 ;)

However, I'm having problems because Kafka can't establish a network connection to itself when it uses the service IP (a Kafka broker tries to do this when it is exchanging replication messages with other brokers). For example, let's say I have two worker hosts (172.17.8.201 and 172.17.8.202) and my pods are scheduled like this:

  • Host 1 (172.17.8.201)

    • kafka1 pod (10.2.16.1)
  • Host 2 (172.17.8.202)

    • kafka2 pod (10.2.68.1)
    • kafka3 pod (10.2.68.2)

In addition, let's say I have the following service IPs:

  • kafka1 cluster IP: 11.1.2.96
  • kafka2 cluster IP: 11.1.2.120
  • kafka3 cluster IP: 11.1.2.123

The problem happens when the kafka1 pod (container) tries to send a message (to itself) using the kafka1 cluster IP (11.1.2.96). For some reason, the connection cannot established and the message is not sent.

Some more information: If I manually connect to the kafka1 pod, I can correctly telnet to kafka2 and kafka3 pods using their respective cluster IPs (11.1.2.120 / 11.1.2.123). Also, if I'm in the kafka2 pod, I connect to both kafka1 and kafka3 pods using 11.1.2.96 and 11.1.2.123. Finally, I can connect to all pods (from all pods) if I use the pod IPs.

It is important to emphasize that I shouldn't tell the kafka brokers to use the pod IPs instead of the cluster IPs for replication. As it is right now, Kafka uses for replication whatever IP you configure to be "advertised" - which is the IP that your client uses to connect to the brokers. Even if I could, I believe this problem may appear with other software as well.

This problem seems to happen only with the combination I am using, because the exact same files work correctly in GCE. Right now, I'm running:

  • Kubernetes 1.1.2
  • coreos 928.0.0
  • network setup with flannel
  • everything on vagrant + VirtualBpx

After some debugging, I'm not sure if the problem is in the workers iptables rules, in kube-proxy, or in flannel.

PS: I posted this question originally as an Issue on their github, but I have been redirected to here by the Kubernetes team. I reword the text a bit because it was sounding like it was a "support request", but actually I believe it is some sort of bug. Anyway, sorry about that Kubernetes team!


Edit: This problem has been confirmed as a bug https://github.com/kubernetes/kubernetes/issues/20391

-- virsox
apache-kafka
kubernetes

2 Answers

1/22/2016

for what you want to do you should be using a Headless Service http://kubernetes.io/v1.0/docs/user-guide/services.html#headless-services

this means setting

clusterIP: None

in your Service

and that means there won't be an IP associated with the service but it will return all IPs of the Pods selected by the selector

-- MrE
Source: StackOverflow

4/26/2016

Update: The bug is fixed in v1.2.4

You can try container hook.

containers:
  - name: kafka
    image: Kafka
    lifecycle:
      postStart:
        exec:
          command:
            - "some.sh" #some shell scripts to get this pod's IP and notify the other Kafka members that "add me into your cluster"
      preStop:
        exec:
          command:
            - "some.sh" #some shell scripts to get other Kafka pods' IP and notify the other Kafka members that "delete me from your cluster"

I have got a similar problems on running 3 mongodb pods as a cluster but the pods cannot access themselves through their serivces' IP.

In addition, has the bug been fixed?

-- Haoyuan Ge
Source: StackOverflow