Kafka Pod doesn't start on GKE

10/25/2018

I followed this tutorial and when I tried to run it on GKE I was not able to start kafka pod.

It returns CrashLoopBackOff all the time. And I don't know how to show pod error logs.

Here is the result when I hit kubectl describe pod my-pod-xxx:

Name:           kafka-broker1-54cb95fb44-hlj5b
Namespace:      default
Node:           gke-xxx-default-pool-f9e313ed-zgcx/10.146.0.4
Start Time:     Thu, 25 Oct 2018 11:40:21 +0900
Labels:         app=kafka
                id=1
                pod-template-hash=1076519600
Annotations:    kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container kafka
Status:         Running
IP:             10.48.8.10
Controlled By:  ReplicaSet/kafka-broker1-54cb95fb44
Containers:
  kafka:
    Container ID:   docker://88ee6a1df4157732fc32b7bd8a81e329dbdxxxx9cbe614689e775d183dbcd61
    Image:          wurstmeister/kafka
    Image ID:       docker-pullable://wurstmeister/kafka@sha256:4f600a95fa1288f7b1xxxxxa32ca00b4fb13b83b31533fa6b40499bd9bdf192f
    Port:           9092/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Thu, 25 Oct 2018 14:35:32 +0900
      Finished:     Thu, 25 Oct 2018 14:35:51 +0900
    Ready:          False
    Restart Count:  37
    Requests:
      cpu:  100m
    Environment:
      KAFKA_ADVERTISED_PORT:       9092
      KAFKA_ADVERTISED_HOST_NAME:  35.194.100.32
      KAFKA_ZOOKEEPER_CONNECT:     zoo1:2181
      KAFKA_BROKER_ID:             1
      KAFKA_CREATE_TOPICS:         topic1:3:3
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-w6s7n (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  default-token-w6s7n:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-w6s7n
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                From                                                   Message
  ----     ------   ----               ----                                                   -------
  Warning  BackOff  5m (x716 over 2h)  kubelet, gke-xxx-default-pool-f9e313ed-zgcx  Back-off restarting failed container
  Normal   Pulling  36s (x38 over 2h)  kubelet, gke-xxxdefault-pool-f9e313ed-zgcx  pulling image "wurstmeister/kafka"

I noticed that on the first run it is going well but after that,Node is changing status to NotReady and kafka pod is entering the CrashLoopBackOff state.

Here is the log before it goes down:

Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 5m default-scheduler Successfully assigned kafka-broker1-54cb95fb44-wwf2h to gke-xxx-default-pool-f9e313ed-8mr6 Normal SuccessfulMountVolume 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 MountVolume.SetUp succeeded for volume "default-token-w6s7n" Normal Pulling 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 pulling image "wurstmeister/kafka" Normal Pulled 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 Successfully pulled image "wurstmeister/kafka" Normal Created 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 Created container Normal Started 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 Started container Normal NodeControllerEviction 38s node-controller Marking for deletion Pod kafka-broker1-54cb95fb44-wwf2h from Node gke-dev-centurion-default-pool-f9e313ed-8mr6

Could anyone tell me what's wrong with my pod and how can I catch the error for pod failure?

-- Quoc Lap
apache-kafka
google-kubernetes-engine
kubernetes

1 Answer

10/25/2018

I just figured out that my cluster's nodes have not enough resources. After creating a new cluster with more memory, it works.

-- Quoc Lap
Source: StackOverflow