How Datadog can get the JMX Metrics from Strimzi Kafka pods on AKS?

2/3/2022

I already read a lot of the documentation from Datadog and Strimzi about the JMX autodiscovery and the JMX configuration. But I missing something, at least it's not working (dd doesn't get the metrics)

Im using kubectl to an AKS, installed Strimzi to use Kafka on AKS

helm install strimzi-kafka-release strimzi/strimzi-kafka-operator

and with kafka-single.yaml setting up the kafka and zokeeper pods

kubectl apply -f kafka-single.yaml  -n aks

then install the datadog agent with datadog-values.yaml file

helm install datadog-agent -f datadog-values.yaml --set datadog.site='datadoghq.com' --set datadog.apiKey='$DD-KEY' datadog/datadog

and I can even see the options for the jmx to be available on the process inspect in Datadog

I'm pretty sure I have something badplaced or badcalled, but I'm a little frustrated rn and can't get to what is the thing that doesn't allow the metrics to be discoverable for datadog.

I tried to edit the confd option on the datadog-values.yaml, but creates the files in /etc/datadog-agent/conf.d instead of /etc/datadog-agent/conf.d/kafka.d/ where it is recognized the conf file and try to do something (I guess, at least fails when I change the host)

I'm editing and copying kafka-conf.yaml directly to the pod

kubectl cp kafka-conf.yaml  datadog-agent-pod:/etc/datadog-agent/conf.d/kafka.d/conf.yaml

and then I try the command

kubectl exec -it  datadog-agent-pod agent jmx list matching

where it fails if I put localhost or somethig else different than %%host%%

(the failing message when I tried with directly wtit an IP)

Loading configs...
Config  kafka  was loaded.
2022-02-03 18:49:23 GMT | JMX | INFO | App | JMX Fetch 0.44.6 has started
2022-02-03 18:49:23 GMT | JMX | INFO | App | Found 0 config files
2022-02-03 18:49:24 GMT | JMX | INFO | App | update is in order - updating timestamp: 1643914164
2022-02-03 18:49:24 GMT | JMX | INFO | App | Cleaning up instances...
2022-02-03 18:49:24 GMT | JMX | INFO | App | Dealing with YAML config instances...
2022-02-03 18:49:24 GMT | JMX | INFO | App | Dealing with Auto-Config instances collected...
2022-02-03 18:49:24 GMT | JMX | INFO | App | Instantiating instance for: kafka
2022-02-03 18:49:24 GMT | JMX | INFO | App | Started instance initialization...
2022-02-03 18:49:24 GMT | JMX | INFO | Instance | Trying to connect to JMX Server at 10.244.0.66:9999
2022-02-03 18:49:24 GMT | JMX | INFO | Instance | Connection closed or does not exist. Attempting to create a new connection...
2022-02-03 18:49:24 GMT | JMX | INFO | ConnectionFactory | Connecting using JMX Remote
2022-02-03 18:49:24 GMT | JMX | INFO | Connection | Connecting to: service:jmx:rmi:///jndi/rmi://10.244.0.66:9999/jmxrmi
2022-02-03 18:49:27 GMT | JMX | INFO | App | Completed instance initialization...
2022-02-03 18:49:27 GMT | JMX | WARN | App | Could not initialize instance: kafka-10.244.0.66-9999: 
java.util.concurrent.ExecutionException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is: 	java.net.NoRouteToHostException: No route to host (Host unreachable)]
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.datadog.jmxfetch.App.processRecoveryResults(App.java:1001)
	at org.datadog.jmxfetch.App$6.invoke(App.java:977)
	at org.datadog.jmxfetch.tasks.TaskProcessor.processTasks(TaskProcessor.java:63)
	at org.datadog.jmxfetch.App.init(App.java:969)
	at org.datadog.jmxfetch.App.run(App.java:205)
	at org.datadog.jmxfetch.App.main(App.java:153)
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is: 	java.net.NoRouteToHostException: No route to host (Host unreachable)]
	at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:370)
	at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
	at org.datadog.jmxfetch.Connection.createConnection(Connection.java:64)
	at org.datadog.jmxfetch.RemoteConnection.<init>(RemoteConnection.java:101)
	at org.datadog.jmxfetch.ConnectionFactory.createConnection(ConnectionFactory.java:38)
	at org.datadog.jmxfetch.Instance.getConnection(Instance.java:403)
	at org.datadog.jmxfetch.Instance.init(Instance.java:416)
	at org.datadog.jmxfetch.InstanceInitializingTask.call(InstanceInitializingTask.java:15)
	at org.datadog.jmxfetch.InstanceInitializingTask.call(InstanceInitializingTask.java:3)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is: 	java.net.NoRouteToHostException: No route to host (Host unreachable)]
	at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:137)
	at java.naming/com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:207)
	at java.naming/javax.naming.InitialContext.lookup(InitialContext.java:409)
	at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1839)
	at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1813)
	at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:302)
	... 12 more
Caused by: java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is: 
	java.net.NoRouteToHostException: No route to host (Host unreachable)
	at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:635)
	at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:209)
	at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:196)
	at java.rmi/sun.rmi.server.UnicastRef.newCall(UnicastRef.java:343)
	at java.rmi/sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:116)
	at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:133)
	... 17 more
Caused by: java.net.NoRouteToHostException: No route to host (Host unreachable)
	at org.datadog.jmxfetch.util.JmxfetchRmiClientSocketFactory.getSocketFromFactory(JmxfetchRmiClientSocketFactory.java:67)
	at org.datadog.jmxfetch.util.JmxfetchRmiClientSocketFactory.createSocket(JmxfetchRmiClientSocketFactory.java:40)
	at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:617)
	... 22 more

but when the host is with %% there's no error but it get nothing from the kafka pods.

What I'm doing wrong? or just what I have wrong on this setting? .-. I checked other answers and quesions and a lot of docs these last days just to get the kafka metrics and apparently One does not simply configure datadog for JMX autodiscovery in AKS with Strimzi/Kafka... I just need the topics metrics.

I know that Strimzi aims to have Prometheus Metrics, but I need Datadog and I already got scolded for trying the Prometheus option (bc I couldn't enable it and get the metrics from there to dd).

I feel like it has to be something with the annotations, but tbh idk.

Please help, I can't be the only one with this problem.

-- Roberto Morales
apache-kafka
azure-aks
datadog
kubernetes
strimzi

0 Answers