I already read a lot of the documentation from Datadog and Strimzi about the JMX autodiscovery and the JMX configuration. But I missing something, at least it's not working (dd doesn't get the metrics)
Im using kubectl to an AKS, installed Strimzi to use Kafka on AKS
helm install strimzi-kafka-release strimzi/strimzi-kafka-operator
and with kafka-single.yaml setting up the kafka and zokeeper pods
kubectl apply -f kafka-single.yaml -n aks
then install the datadog agent with datadog-values.yaml file
helm install datadog-agent -f datadog-values.yaml --set datadog.site='datadoghq.com' --set datadog.apiKey='$DD-KEY' datadog/datadog
and I can even see the options for the jmx to be available on the process inspect in Datadog
I'm pretty sure I have something badplaced or badcalled, but I'm a little frustrated rn and can't get to what is the thing that doesn't allow the metrics to be discoverable for datadog.
I tried to edit the confd option on the datadog-values.yaml, but creates the files in /etc/datadog-agent/conf.d instead of /etc/datadog-agent/conf.d/kafka.d/ where it is recognized the conf file and try to do something (I guess, at least fails when I change the host)
I'm editing and copying kafka-conf.yaml directly to the pod
kubectl cp kafka-conf.yaml datadog-agent-pod:/etc/datadog-agent/conf.d/kafka.d/conf.yaml
and then I try the command
kubectl exec -it datadog-agent-pod agent jmx list matching
where it fails if I put localhost or somethig else different than %%host%%
(the failing message when I tried with directly wtit an IP)
Loading configs...
Config kafka was loaded.
2022-02-03 18:49:23 GMT | JMX | INFO | App | JMX Fetch 0.44.6 has started
2022-02-03 18:49:23 GMT | JMX | INFO | App | Found 0 config files
2022-02-03 18:49:24 GMT | JMX | INFO | App | update is in order - updating timestamp: 1643914164
2022-02-03 18:49:24 GMT | JMX | INFO | App | Cleaning up instances...
2022-02-03 18:49:24 GMT | JMX | INFO | App | Dealing with YAML config instances...
2022-02-03 18:49:24 GMT | JMX | INFO | App | Dealing with Auto-Config instances collected...
2022-02-03 18:49:24 GMT | JMX | INFO | App | Instantiating instance for: kafka
2022-02-03 18:49:24 GMT | JMX | INFO | App | Started instance initialization...
2022-02-03 18:49:24 GMT | JMX | INFO | Instance | Trying to connect to JMX Server at 10.244.0.66:9999
2022-02-03 18:49:24 GMT | JMX | INFO | Instance | Connection closed or does not exist. Attempting to create a new connection...
2022-02-03 18:49:24 GMT | JMX | INFO | ConnectionFactory | Connecting using JMX Remote
2022-02-03 18:49:24 GMT | JMX | INFO | Connection | Connecting to: service:jmx:rmi:///jndi/rmi://10.244.0.66:9999/jmxrmi
2022-02-03 18:49:27 GMT | JMX | INFO | App | Completed instance initialization...
2022-02-03 18:49:27 GMT | JMX | WARN | App | Could not initialize instance: kafka-10.244.0.66-9999:
java.util.concurrent.ExecutionException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is: java.net.NoRouteToHostException: No route to host (Host unreachable)]
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.datadog.jmxfetch.App.processRecoveryResults(App.java:1001)
at org.datadog.jmxfetch.App$6.invoke(App.java:977)
at org.datadog.jmxfetch.tasks.TaskProcessor.processTasks(TaskProcessor.java:63)
at org.datadog.jmxfetch.App.init(App.java:969)
at org.datadog.jmxfetch.App.run(App.java:205)
at org.datadog.jmxfetch.App.main(App.java:153)
Caused by: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is: java.net.NoRouteToHostException: No route to host (Host unreachable)]
at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:370)
at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at org.datadog.jmxfetch.Connection.createConnection(Connection.java:64)
at org.datadog.jmxfetch.RemoteConnection.<init>(RemoteConnection.java:101)
at org.datadog.jmxfetch.ConnectionFactory.createConnection(ConnectionFactory.java:38)
at org.datadog.jmxfetch.Instance.getConnection(Instance.java:403)
at org.datadog.jmxfetch.Instance.init(Instance.java:416)
at org.datadog.jmxfetch.InstanceInitializingTask.call(InstanceInitializingTask.java:15)
at org.datadog.jmxfetch.InstanceInitializingTask.call(InstanceInitializingTask.java:3)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is: java.net.NoRouteToHostException: No route to host (Host unreachable)]
at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:137)
at java.naming/com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:207)
at java.naming/javax.naming.InitialContext.lookup(InitialContext.java:409)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1839)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1813)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:302)
... 12 more
Caused by: java.rmi.ConnectIOException: Exception creating connection to: 10.244.0.66; nested exception is:
java.net.NoRouteToHostException: No route to host (Host unreachable)
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:635)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:209)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:196)
at java.rmi/sun.rmi.server.UnicastRef.newCall(UnicastRef.java:343)
at java.rmi/sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:116)
at jdk.naming.rmi/com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:133)
... 17 more
Caused by: java.net.NoRouteToHostException: No route to host (Host unreachable)
at org.datadog.jmxfetch.util.JmxfetchRmiClientSocketFactory.getSocketFromFactory(JmxfetchRmiClientSocketFactory.java:67)
at org.datadog.jmxfetch.util.JmxfetchRmiClientSocketFactory.createSocket(JmxfetchRmiClientSocketFactory.java:40)
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:617)
... 22 more
but when the host is with %% there's no error but it get nothing from the kafka pods.
What I'm doing wrong? or just what I have wrong on this setting? .-. I checked other answers and quesions and a lot of docs these last days just to get the kafka metrics and apparently One does not simply configure datadog for JMX autodiscovery in AKS with Strimzi/Kafka... I just need the topics metrics.
I know that Strimzi aims to have Prometheus Metrics, but I need Datadog and I already got scolded for trying the Prometheus option (bc I couldn't enable it and get the metrics from there to dd).
I feel like it has to be something with the annotations, but tbh idk.
Please help, I can't be the only one with this problem.