Kafka on Kubernetes - UNKNOWN_TOPIC_OR_PARTITION and LEADER_NOT_AVAILABLE error

5/9/2018

This is a follow-up question on this. I have managed to do the following:

  1. Create a headless service for my 5 broker Kafka cluster for inter-broker communication
  2. Set up one service for each broker
    1. each service has an external ip
    2. only one pod is selected for each service, e.g. service "kafka-0-es" selects the pod "kafka-0"
  3. The pods advertise their respective external ip correctly. I verified this by accessing the data on the ZooKeeper CLI.

I created a topic test-topic with zkCli and verified it has been created. After that, I started the Kafka console producer.

.\kafka-console-producer.bat --broker-list EXTERNAL_IP_1:9093,EXTERNAL_IP_2:9093,EXTERNAL_IP_3:9093,EXTERNAL_IP_4:9093,EXTERNAL_IP_5:9093 --topic test-topic --property parse.key=true --property key.
separator=:
>afkjdshasdkfjhsdkjsf:128379127893123
>[2018-05-09 17:35:51,622] WARN [Producer clientId=console-producer] Got error produce response with correlation id 9 on topic-partition test-topic-0, retrying (2 attempts left). Error: UNKNOWN_TOPIC_OR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,623] WARN [Producer clientId=console-producer] Received unknown topic or partition error in produce request on partition test-topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,649] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 10 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:35:51,720] WARN [Producer clientId=console-producer] Got error produce response with correlation id 11 on topic-partition test-topic-0, retrying (1 attempts left). Error: UNKNOWN_TOPIC_OR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,720] WARN [Producer clientId=console-producer] Received unknown topic or partition error in produce request on partition test-topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,773] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 12 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:35:51,823] WARN [Producer clientId=console-producer] Got error produce response with correlation id 13 on topic-partition test-topic-0, retrying (0 attempts left). Error: UNKNOWN_TOPIC_OR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,823] WARN [Producer clientId=console-producer] Received unknown topic or partition error in produce request on partition test-topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,913] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 14 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:35:51,936] ERROR Error when sending message to topic test-topic with key: 20 bytes, value: 15 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
[2018-05-09 17:35:51,945] WARN [Producer clientId=console-producer] Received unknown topic or partition error in produce request on partition test-topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:52,034] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 16 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:35:52,161] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 20 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:40:52,288] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 25 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

My Kafka broker "kafka-2" is the leader of this topic, according to Zookeeper:

get /kafka/brokers/topics/test-topic/partitions/0/state

{"controller_epoch":5,"leader":2,"version":1,"leader_epoch":0,"isr":[2,1]} 

But the pod kafka-2 is throwing errors in the Log

[2018-05-09 15:21:02,524] ERROR [ReplicaFetcherThread-0-2], Error for partition [test-topic,0] to broker 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)

Not quite sure why this is happening, the configuration looks fine to me. Is there something more I am missing to get my Kafka cluster running on Kubernetes?

Note that I have also tried to completely wipe my cluster (scale down kafka cluster, delete kafka storage, scale down zk cluster, delete zk storage, scale up zk, scale up kafka) but to no avail.

-- j9dy
apache-kafka
kubernetes

1 Answer

5/11/2018

I have fixed it just now. The problem was that my headless service contained both the internal as well as the external port.

Now, my headless service does only contain the internal port:

apiVersion: v1
kind: Service
metadata:
  name: kafka-hs
  labels:
    app: kafka
spec:
  ports:
  - port: 29092
    name: server
  clusterIP: None
  selector:
    app: kafka

And my per-pod-services that expose the external ip contain the external port (note that an RedHat OpenShift script handles the allocation of external ips to these services, this is not covered in the service definition):

apiVersion: v1
kind: Service
metadata:
  name: kafka-es-4
  labels:
    app: kafka
  namespace: whatever
spec:
  ports:
  - port: 9093
    name: kafka-port
    protocol: TCP
  selector:
    statefulset.kubernetes.io/pod-name: kafka-4
    app: kafka
  type: LoadBalancer
-- j9dy
Source: StackOverflow