First of all I'd like to understand clearly something, if I run in a kubernetes cluster a telegraf daemonset, it will collect the metrics of the pods? Or it will collect the metrics of the physical nodes?
I've created a telegraf daemonset in my test kubernetes cluster running on my laptop under hyperv, based on this kubernetes cluster installation:
I would like to collect metrics of the pods but it doesn't arrive to the kafka machine. I get this error in the logs:
2019-05-08T02:36:35Z I! Starting Telegraf 1.9.2
2019-05-08T02:36:35Z I! Using config file: /etc/telegraf/telegraf.conf
2019-05-08T02:46:36Z E! [agent] Failed to connect to output kafka, retrying in 15s, error was 'kafka: client has run out of available brokers to talk to (Is your cluster reachable?)'
This is the daemonset definition file:
apiVersion: v1
kind: ConfigMap
metadata:
name: telegraf
namespace: monitoring
labels:
k8s-app: telegraf
data:
telegraf.conf: |+
[global_tags]
env = "$ENV"
[agent]
hostname = "$HOSTNAME"
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "2s"
precision = ""
debug = false
quiet = true
logfile = ""
[[outputs.kafka]]
brokers = ["10.121.63.5:9092", "10.121.63.18:9092", "10.121.62.64:9092", "10.121.62.80:9092", "10.121.63.22:9092"]
topic = "telegraf-measurements-json"
client_id = "golangsarama__1.18.0__serverinfra__telegraf"
routing_tag = "host"
version = "0.11.0.2"
compression_codec = 2
required_acks = 1
data_format = "json"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
[[inputs.kubernetes]]
url = "https://192.168.213.18:6443"
insecure_skip_verify = true
---
# Section: Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: telegraf
namespace: monitoring
labels:
k8s-app: telegraf
spec:
selector:
matchLabels:
name: telegraf
template:
metadata:
labels:
name: telegraf
spec:
containers:
- name: telegraf
image: docker.io/telegraf:1.9.2
resources:
limits:
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: "HOST_PROC"
value: "/rootfs/proc"
- name: "HOST_SYS"
value: "/rootfs/sys"
- name: ENV
valueFrom:
secretKeyRef:
name: telegraf
key: env
volumeMounts:
- name: sys
mountPath: /rootfs/sys
readOnly: true
- name: proc
mountPath: /rootfs/proc
readOnly: true
- name: docker-socket
mountPath: /var/run/docker.sock
- name: utmp
mountPath: /var/run/utmp
readOnly: true
- name: config
mountPath: /etc/telegraf
terminationGracePeriodSeconds: 30
volumes:
- name: sys
hostPath:
path: /sys
- name: docker-socket
hostPath:
path: /var/run/docker.sock
- name: proc
hostPath:
path: /proc
- name: utmp
hostPath:
path: /var/run/utmp
- name: config
configMap:
name: telegraf
This is the article that I followed to create a daemonset.
Here is the pods:
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-65f88748fd-jztrz 1/1 Running 0 7d18h
kube-system coredns-fb8b8dccf-rl48l 1/1 Running 0 7d18h
kube-system coredns-fb8b8dccf-x8fvx 1/1 Running 0 7d18h
kube-system etcd-k8s-master 1/1 Running 2 7d18h
kube-system kube-apiserver-k8s-master 1/1 Running 2 7d18h
kube-system kube-controller-manager-k8s-master 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-96tsl 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-b884r 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-pdqmq 1/1 Running 0 7d18h
kube-system kube-proxy-42k2g 1/1 Running 0 7d18h
kube-system kube-proxy-77pw9 1/1 Running 0 7d18h
kube-system kube-proxy-n5mbs 1/1 Running 0 7d18h
kube-system kube-scheduler-k8s-master 1/1 Running 2 7d18h
monitoring telegraf-dvtcl 1/1 Running 5 117m
monitoring telegraf-n2mqz 1/1 Running 5 117m
tcpdump shows that something sent from the daemonset:
09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......
y?...#..?5]......n(._.........
z#0........................
09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......
y?...#..?5]......n(._.........
But I can't see anything on our grafana dashboard. If I install a standalone rpm based telegraf on the nodes, it sents out and I can see the metrics. But I'm curious of the pod metrics.
This error from Telegraf just means there is no connection made to your 10 class ip range of brokers in your broker array in the config. Depending on how you setup networking and routing you may just have a simple routing issue to those private IPs that have your Kafka cluster.