Getting Kafka Connect JMX metrics reporting into Datadog

7/7/2021

I am working won a project involving Kafka Connect. We have a Kafka Connect cluster running on Kubernetes with some Snowflake connectors already spun up and working. The part we are having issues with now is trying to get the JMX metrics from the Kafka Connect cluster to report in Datadog. From my understanding of the Docs (https://docs.confluent.io/home/connect/monitoring.html#using-jmx-to-monitor-kconnect) the workers are already emitting metrics by default and we just need to find a way to get it reported to Datadog.

In our K8 Configmap we have these values set:

    CONNECT_KAFKA_JMX_PORT: "9095"
    KAFKA_JMX_PORT: "9095"
    JMX_PORT: "9095"

I have included this launch script where we are setting the KAFKA_JMX_PORT env var:

export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=<redacted> -Dcom.sun.management.jmxremote.rmi.port=${JMX_PORT}"

I’ve been looking online and all over Stackoverflow and haven’t actually seen an example of people getting JMX metrics reporting to Datadog and standing up a dashboard there so I was wondering if anyone had experience with this.

-- codean
apache-kafka
apache-kafka-connect
datadog
jmx
kubernetes

1 Answer

7/9/2021

Firstly, your Datadog agents need to have Java/JMX integration.

Secondly, use Datadog JMX integration with auto-discovery, where kafka-connect must match the container name.

annotations:
  ad.datadoghq.com/kafka-connect.check_names: '["jmx"]'
  ad.datadoghq.com/kafka-connect.init_configs: '[{}]'
  ad.datadoghq.com/kafka-connect.instances: |
    [
      {
        "host": "%%host%%",
        "port": 9095,
        "conf": [
          {
            "include": {
              "domain": "kafka.connect",
              "type": "connector-task-metrics",
              "bean_regex": [
                "kafka.connect:type=connector-task-metrics,connector=.*,task=.*"
              ],
              "attribute": {
                "batch-size-max": {
                  "alias": "jmx.kafka.connect.connector.batch_size_max"
                },
                "status": {
                  "metric_type": "gauge",
                  "alias": "jmx.kafka.connect.connector.status",
                  "values": {
                    "running":0,
                    "paused":1,
                    "failed":2,
                    "destroyed":3,
                    "unassigned":-1
                  }
                },
                "batch-size-avg": {
                  "alias": "jmx.kafka.connect.connector.batch_size_avg"
                },
                "offset-commit-avg-time-ms": {
                  "alias": "jmx.kafka.connect.connector.offset_commit_avg_time"
                },
                "offset-commit-max-time-ms": {
                  "alias": "jmx.kafka.connect.connector.offset_commit_max_time"
                },
                "offset-commit-failure-percentage": {
                  "alias": "jmx.kafka.connect.connector.offset_commit_failure_percentage"
                }
              }
            }
          },
          {
            "include": {
              "domain": "kafka.connect",
              "type": "source-task-metrics",
              "bean_regex": [
                "kafka.connect:type=source-task-metrics,connector=.*,task=.*"
              ],
              "attribute": {
                "source-record-poll-rate": {
                  "alias": "jmx.kafka.connect.task.source_record_poll_rate"
                },
                "source-record-write-rate": {
                  "alias": "jmx.kafka.connect.task.source_record_write_rate"
                },
                "poll-batch-avg-time-ms": {
                  "alias": "jmx.kafka.connect.task.poll_batch_avg_time"
                },
                "source-record-active-count-avg": {
                  "alias": "jmx.kafka.connect.task.source_record_active_count_avg"
                },
                "source-record-write-total": {
                  "alias": "jmx.kafka.connect.task.source_record_write_total"
                },
                "source-record-poll-total": {
                  "alias": "jmx.kafka.connect.task.source_record_poll_total"
                }
              }
            }
          }
        ]
      }
    ]
-- Wenli Wan
Source: StackOverflow