Unable to connect to zookeeper server

9/22/2017

I am trying to set up kafka on kubernetes. using below deployment file. I am able to telnet to zookeeper on 10.98.144.178:2181 but still getting below error. please assist how to proceed:

kafka-cluster.yml

---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: kafka-broker1
spec:
  template:
    metadata:
      labels:
        app: kafka
        id: "1"
    spec:
      containers:
      - name: kafka
        image: wurstmeister/kafka
        ports:
        - containerPort: 9092
        env:
        - name: KAFKA_ADVERTISED_PORT
          value: "9092"
        - name: KAFKA_ADVERTISED_HOST_NAME
          value: "192.168.42.182"
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: 10.98.144.178:2181
        - name: KAFKA_BROKER_ID
          value: "1"
        - name: KAFKA_CREATE_TOPICS
          value: topic1:3:3

I am able to telnet to zookeeper on 10.98.144.178:2181 but still getting below error. please assist how to proceed:

 [2017-09-22 11:22:03,487] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server '10.98.144.178:2181' with timeout of 6000 ms
    at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1233)
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:157)
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:131)
    at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:103)
    at kafka.utils.ZkUtils$.apply(ZkUtils.scala:85) 
-- Sarpreet
apache-kafka
apache-zookeeper
kubernetes

1 Answer

9/26/2017

I have faced the same issue and my observation is that it is a sporadic issue which is proportional to the network delay between the ZooKeeper host vs. the Kafka host. Following are the configurations which help "alleviate" the issue:

zookeeper.connection.timeout.ms
zookeeper.session.timeout.ms

The default value is 6000ms, which turns out to be low number if there are n/w delays. I increased the value to 30000ms to resolve the issue.

-- Sachin Lala
Source: StackOverflow