Kafka with Confluent Kubernetes Helm Charts = Schema Registry WakeupException

10/21/2018

My Main Question: Why is the schema-registry crashing?

Peripheral Question: Why are two pods launching for each of zookeeper/kafka/schema-registry if I've configured one server for each? Does everything thing else look basically right?

helm repo update
<snip>

helm install --values values.yaml --name my-confluent-oss confluentinc/cp-helm-charts
<snip>

helm list
NAME                REVISION    UPDATED                     STATUS      CHART                   APP VERSION NAMESPACE
my-confluent-oss    1           Sat Oct 20 19:09:08 2018    DEPLOYED    cp-helm-charts-0.1.0    1.0         default  

kubectl get pods
NAME                                                   READY     STATUS             RESTARTS   AGE
my-confluent-oss-cp-kafka-0                            2/2       Running            0          20m
my-confluent-oss-cp-schema-registry-59d8877584-c2jc7   1/2       CrashLoopBackOff   7          20m
my-confluent-oss-cp-zookeeper-0                        2/2       Running            0          20m

My values.yaml is as follows. I've tested this out with helm install --debug --dry-run. I'm just disabling persistence, setting a single server (this is a development setup for running in a VM), and disabling the extra services for the moment until I get the basics working:

cp-kafka:
  brokers: 1
  persistence:
    enabled: false

  cp-zookeeper:
    persistence:
      enabled: false
    servers: 1

cp-zookeeper:
  persistence:
    enabled: false
  servers: 1

cp-kafka-connect:
  enabled: false

cp-kafka-rest:
  enabled: false

cp-ksql-server:
  enabled: false

Here are the logs for the failing schema-registry:

kubectl logs my-confluent-oss-cp-schema-registry-59d8877584-c2jc7 cp-schema-registry-server

<snip>
[2018-10-21 00:28:14,738] INFO Kafka version : 2.0.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-21 00:28:14,738] INFO Kafka commitId : 4b1dd33f255ddd2f (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-21 00:28:14,751] INFO Cluster ID: ofJRwpXzRn-ltDn8b_6h3A (org.apache.kafka.clients.Metadata)
[2018-10-21 00:28:14,753] INFO Initialized last consumed offset to -1 (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:28:14,756] INFO [kafka-store-reader-thread-_schemas]: Starting (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:28:14,800] INFO [Consumer clientId=KafkaStore-reader-_schemas, groupId=my-confluent-oss] Resetting offset for partition _schemas-0 to offset 0. (org.apache.kafka.clients.consumer.internals.Fetcher)
[2018-10-21 00:28:14,821] INFO Cluster ID: ofJRwpXzRn-ltDn8b_6h3A (org.apache.kafka.clients.Metadata)
[2018-10-21 00:28:14,857] INFO Wait to catch up until the offset of the last message at 7 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2018-10-21 00:28:14,930] INFO Joining schema registry with Kafka-based coordination (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2018-10-21 00:28:14,939] INFO Kafka version : 2.0.0-cp1 (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-21 00:28:14,940] INFO Kafka commitId : 4b1dd33f255ddd2f (org.apache.kafka.common.utils.AppInfoParser)
[2018-10-21 00:28:14,953] INFO Cluster ID: ofJRwpXzRn-ltDn8b_6h3A (org.apache.kafka.clients.Metadata)
[2018-10-21 00:29:14,945] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
    at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:220)
    at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:63)
    at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:41)
    at io.confluent.rest.Application.createServer(Application.java:169)
    at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
    at io.confluent.kafka.schemaregistry.masterelector.kafka.KafkaGroupMasterElector.init(KafkaGroupMasterElector.java:202)
    at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:215)
    ... 4 more
[2018-10-21 00:29:14,948] INFO Shutting down schema registry (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2018-10-21 00:29:14,949] INFO [kafka-store-reader-thread-_schemas]: Shutting down (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:29:14,950] INFO [kafka-store-reader-thread-_schemas]: Stopped (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:29:14,951] INFO [kafka-store-reader-thread-_schemas]: Shutdown completed (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:29:14,953] INFO KafkaStoreReaderThread shutdown complete. (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2018-10-21 00:29:14,953] INFO [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2018-10-21 00:29:14,959] ERROR Unexpected exception in schema registry group processing thread (io.confluent.kafka.schemaregistry.masterelector.kafka.KafkaGroupMasterElector)
org.apache.kafka.common.errors.WakeupException
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:498)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:284)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161)
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:243)
    at io.confluent.kafka.schemaregistry.masterelector.kafka.SchemaRegistryCoordinator.ensureCoordinatorReady(SchemaRegistryCoordinator.java:207)
    at io.confluent.kafka.schemaregistry.masterelector.kafka.SchemaRegistryCoordinator.poll(SchemaRegistryCoordinator.java:97)
    at io.confluent.kafka.schemaregistry.masterelector.kafka.KafkaGroupMasterElector$1.run(KafkaGroupMasterElector.java:192)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

I'm using minikube 0.30.0 and a fresh, clean minikube vm:

➜  kubectl version

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5", GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean", BuildDate:"2018-06-22T05:40:33Z", GoVersion:"go1.9.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
-- clay
apache-kafka
confluent
confluent-schema-registry
kubernetes
kubernetes-helm

1 Answer

10/21/2018

Your schema registry can't join your Kafka group. You'll have to check the configs, your schema registry needs to perform a leader election initially and that leader election could be either through Zookeeper or Kafka.

Looks like the Helm chart installs the schema registry using Kafka leader election, and you can also see that you can manually pass the Kafka broker parameter or it picks it from .Values.kafka.bootstrapServers, but also the value for .bootstrapServers appears empty. You can see what config value is in your deployment by simply running something like:

$ kubectl get deployment my-confluent-oss-cp-schema-registry -o=yaml

Then you can change it to point the internal Kubernetes my-confluent-oss-cp-kafka service endpoint with:

$ kubectl edit deployment cp-schema-registry

Also, note that as of this writing the cp-helm-charts are in developer preview so use it at your own risk.

The other parameter you can configure is SCHEMA_REGISTRY_KAFKASTORE_INIT_TIMEOUT_CONFIG since this is exactly where you are seeing the error. So the Kafka Schema registry maybe timing out while trying to connect to the Kafka store. (maybe related to minikube). What's kind of odd is that it should retry.

-- Rico
Source: StackOverflow