SSL handshake failed in Kafka broker

4/8/2021

I have a Kafka cluster in Kubernetes created using Strimzi.

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: {{ .Values.cluster.kafka.name }}
spec:
  kafka:
    version: 2.7.0
    replicas: 3
    storage:
      deleteClaim: true
      size: {{ .Values.cluster.kafka.storagesize }}
      type: persistent-claim
    rack: 
      topologyKey: failure-domain.beta.kubernetes.io/zone
    template:
      pod:
        metadata:
          annotations:
            prometheus.io/scrape: 'true'
            prometheus.io/port: '9404'                                           
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
        authentication:
          type: tls
      - name: external
        port: 9094
        type: loadbalancer
        tls: true
        authentication:
          type: tls
        configuration:  
          bootstrap:
            loadBalancerIP: {{ .Values.cluster.kafka.bootstrapipaddress }}
          brokers:  
          {{- range  $key, $value := (split "," .Values.cluster.kafka.brokersipaddress) }}  
            - broker: {{ (split "=" .)._0 }}
              loadBalancerIP: {{ (split "=" .)._1 | quote }}
          {{- end }}
    authorization:
      type: simple

Cluster is created and up, I am able to create topics and produce/consume to/from topic. The issue is that if I exec into one of Kafka brokers pods I see intermittent errors

INFO [SocketServer brokerId=0] Failed authentication with /10.240.0.35 (SSL handshake failed) (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-0-ListenerName(EXTERNAL-9094)-SSL-9]

INFO [SocketServer brokerId=0] Failed authentication with /10.240.0.159 (SSL handshake failed) (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-0-ListenerName(EXTERNAL-9094)-SSL-11]

INFO [SocketServer brokerId=0] Failed authentication with /10.240.0.4 (SSL handshake failed) (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-0-ListenerName(EXTERNAL-9094)-SSL-10]

INFO [SocketServer brokerId=0] Failed authentication with /10.240.0.128 (SSL handshake failed) (org.apache.kafka.common.network.Selector) [data-plane-kafka-network-thread-0-ListenerName(EXTERNAL-9094)-SSL-1]

After inspecting these IPs 10.240.0.35, 10.240.0.159, 10.240.0.4,10.240.0.128 I figured out the all they are related to pods from kube-system namespace which are implicitly created as part of Kafka cluster deployment.

enter image description here

Any idea what can be wrong?

-- Inako
apache-kafka
kubernetes
strimzi

1 Answer

4/8/2021

I do not think this is necessarily wrong. You seem to have somewhere some application trying to connect to the broker without properly configured TLS. But as the connection is forwarded the IP probably gets masked - so it does not shwo the real external IP anymore. These can be all kind of things from misconfigured clients up to some healthchecks trying to just open TCP connection (depending on your environment, the load balancer can do it for example).

Unfortunately, it is a bit hard to find out where they really come from. You can try to trace it through the logs of whoeevr owns the IP address it came from, as that forwarded it from someone else etc. You could also try to enable TLS debug in Kafka with the Java system property javax.net.debug=ssl. But that might help only in some cases with misconfigured clients, not with some TPC probes and it will also make it hard to find the right place in the logs because it will also dump the replication traffic etc. which used TLS as well.

-- Jakub
Source: StackOverflow