kafka readiness probes failing

6/18/2018

I deployed kafka and zookeeper in kubernetes. My kafka readiness probes keeps failing if I have readiness probes for zookeeper. In case if I comment or delete readiness probes of zookeeper and deploy again, then the kafka server is starting without any problem( and as well as kafka readiness not failing).

This is the readiness probe for zookeeper:-

readinessProbe:
  tcpSocket:
    port: 2181
  initialDelaySeconds: 20
  periodSeconds: 20
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

my zookeeper log is

2018-06-18 11:27:24,863 [myid:0] - WARN  [SendWorker:5135603447292250196:QuorumCnxManager$SendWorker@951] - Send worker leaving thread
2018-06-18 11:27:24,864 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:57728
2018-06-18 11:27:24,864 [myid:0] - WARN  [RecvWorker:1586112601866174465:QuorumCnxManager$RecvWorker@1025] - Connection broken for id 1586112601866174465, my id = 0, error = 
java.io.IOException: Received packet with invalid packet: -66911279
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1012)
2018-06-18 11:27:24,865 [myid:0] - WARN  [RecvWorker:1586112601866174465:QuorumCnxManager$RecvWorker@1028] - Interrupting SendWorker
2018-06-18 11:27:24,865 [myid:0] - WARN  [SendWorker:1586112601866174465:QuorumCnxManager$SendWorker@941] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2099)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:429)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
2018-06-18 11:27:24,868 [myid:0] - WARN  [SendWorker:1586112601866174465:QuorumCnxManager$SendWorker@951] - Send worker leaving thread
2018-06-18 11:30:54,282 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:47944
2018-06-18 11:31:39,342 [myid:0] - WARN  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.net.SocketException: Connection reset
2018-06-18 11:31:39,342 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:47946
2018-06-18 11:31:39,342 [myid:0] - WARN  [RecvWorker:5135603447292250196:QuorumCnxManager$RecvWorker@1025] - Connection broken for id 5135603447292250196, my id = 0, error = 
java.io.IOException: Received packet with invalid packet: 1414541105
at        org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1012)
2018-06-18 11:31:39,343 [myid:0] - WARN  [RecvWorker:5135603447292250196:QuorumCnxManager$RecvWorker@1028] - Interrupting SendWorker
2018-06-18 11:31:39,343 [myid:0] - WARN  [SendWorker:5135603447292250196:QuorumCnxManager$SendWorker@941] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2099)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:429)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
 2018-06-18 11:31:39,343 [myid:0] - WARN  [SendWorker:5135603447292250196:QuorumCnxManager$SendWorker@951] - Send worker leaving thread
2018-06-18 11:31:44,433 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51010
2018-06-18 11:31:44,437 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51012
2018-06-18 11:31:44,439 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:31:44,440 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51012 (no session established for client)
2018-06-18 11:31:44,452 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51014
2018-06-18 11:31:49,438 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:31:49,438 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51010 (no session established for client)
2018-06-18 11:31:49,452 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:31:49,453 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51014 (no session established for client)
2018-06-18 11:33:59,669 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51148
2018-06-18 11:33:59,700 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:33:59,700 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51148 (no session established for client)
2018-06-18 11:33:59,713 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51150
2018-06-18 11:33:59,730 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:33:59,730 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51150 (no session established for client)
2018-06-18 11:34:00,274 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:48860
2018-06-18 11:34:00,275 [myid:0] - WARN  [RecvWorker:4616370699239609664:QuorumCnxManager$RecvWorker@1025] - Connection broken for id 4616370699239609664, my id = 0, error = 
java.io.IOException: Received packet with invalid packet: -1200847881
at  org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1012)
2018-06-18 11:34:00,275 [myid:0] - WARN  [RecvWorker:4616370699239609664:QuorumCnxManager$RecvWorker@1028] - Interrupting SendWorker
2018-06-18 11:34:00,275 [myid:0] - WARN  [SendWorker:4616370699239609664:QuorumCnxManager$SendWorker@941] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2099)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:429)
at   org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
at   org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at  org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
2018-06-18 11:34:00,276 [myid:0] - WARN  [SendWorker:4616370699239609664:QuorumCnxManager$SendWorker@951] - Send worker leaving thread
2018-06-18 11:34:00,277 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:48862
2018-06-18 11:34:00,285 [myid:0] - WARN  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.net.SocketException: Connection reset
2018-06-18 11:40:10,712 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51522
2018-06-18 11:40:10,713 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:40:10,713 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51522 (no session established for client)
2018-06-18 11:40:10,782 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:49556
2018-06-18 11:40:10,782 [myid:0] - WARN  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.net.SocketException: Connection reset
2018-06-18 16:07:03,456 [myid:0] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2018-06-18 16:07:03,459 [myid:0] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed
-- Arun
apache-kafka
apache-kafka-connect
apache-zookeeper
kubernetes

2 Answers

6/19/2018

You're binding your kafka to zookeeper in a way that it's not a good practice.

I use kafka from apache official site and I have found that there's a specific script to make a readiness probe:

readinessProbe:
  exec:
    command:
    - sh
    - -c
    - "/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server=localhost:9092"
-- Nicola Ben
Source: StackOverflow

3/6/2019

I had similar issue. following changes, helped me overcoming this.

# readinessProbe & livenessProbe 
  readinessProbe:
    tcpSocket:
      port: 9092
    timeoutSeconds: 5
    periodSeconds: 5
    initialDelaySeconds: 45
  livenessProbe:
    exec:
      command:
      - sh
      - -c
      - "kafka-broker-api-versions.sh --bootstrap-server=localhost:9092"
    timeoutSeconds: 5
    periodSeconds: 5
    initialDelaySeconds: 60

Basing on your requirement, You can update following value.

initialDelaySeconds

-- Himansu
Source: StackOverflow