Using telegraf as daemonset to send metrics of kubernetes pods/containers

5/9/2019

First of all I'd like to understand clearly something, if I run in a kubernetes cluster a telegraf daemonset, it will collect the metrics of the pods? Or it will collect the metrics of the physical nodes?

I've created a telegraf daemonset in my test kubernetes cluster running on my laptop under hyperv, based on this kubernetes cluster installation:

I would like to collect metrics of the pods but it doesn't arrive to the kafka machine. I get this error in the logs:

2019-05-08T02:36:35Z I! Starting Telegraf 1.9.2
2019-05-08T02:36:35Z I! Using config file: /etc/telegraf/telegraf.conf
2019-05-08T02:46:36Z E! [agent] Failed to connect to output kafka, retrying in 15s, error was 'kafka: client has run out of available brokers to talk to (Is your cluster reachable?)'

This is the daemonset definition file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: telegraf
  namespace: monitoring
  labels:
    k8s-app: telegraf
data:
  telegraf.conf: |+
    [global_tags]
      env = "$ENV"
    [agent]
      hostname = "$HOSTNAME"
      interval = "60s"
      round_interval = true
      metric_batch_size = 1000
      metric_buffer_limit = 10000
      collection_jitter = "0s"
      flush_interval = "10s"
      flush_jitter = "2s"
      precision = ""
      debug = false
      quiet = true
      logfile = ""

    [[outputs.kafka]]
      brokers = ["10.121.63.5:9092", "10.121.63.18:9092", "10.121.62.64:9092", "10.121.62.80:9092", "10.121.63.22:9092"]
      topic = "telegraf-measurements-json"
      client_id = "golangsarama__1.18.0__serverinfra__telegraf"
      routing_tag = "host"
      version = "0.11.0.2"
      compression_codec = 2
      required_acks = 1
      data_format = "json"

    [[inputs.cpu]]
      percpu = true
      totalcpu = true
      collect_cpu_time = false
      report_active = false
    [[inputs.disk]]
      ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
    [[inputs.diskio]]
    [[inputs.kernel]]
    [[inputs.mem]]
    [[inputs.processes]]
    [[inputs.swap]]
    [[inputs.system]]
    [[inputs.docker]]
      endpoint = "unix:///var/run/docker.sock"
    [[inputs.kubernetes]]
      url = "https://192.168.213.18:6443"
      insecure_skip_verify = true

---
# Section: Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: telegraf
  namespace: monitoring
  labels:
    k8s-app: telegraf
spec:
  selector:
    matchLabels:
      name: telegraf
  template:
    metadata:
      labels:
        name: telegraf
    spec:
      containers:
      - name: telegraf
        image: docker.io/telegraf:1.9.2
        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 500m
            memory: 500Mi
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: "HOST_PROC"
          value: "/rootfs/proc"
        - name: "HOST_SYS"
          value: "/rootfs/sys"
        - name: ENV
          valueFrom:
            secretKeyRef:
              name: telegraf
              key: env
        volumeMounts:
        - name: sys
          mountPath: /rootfs/sys
          readOnly: true
        - name: proc
          mountPath: /rootfs/proc
          readOnly: true
        - name: docker-socket
          mountPath: /var/run/docker.sock
        - name: utmp
          mountPath: /var/run/utmp
          readOnly: true
        - name: config
          mountPath: /etc/telegraf
      terminationGracePeriodSeconds: 30
      volumes:
      - name: sys
        hostPath:
          path: /sys
      - name: docker-socket
        hostPath:
          path: /var/run/docker.sock
      - name: proc
        hostPath:
          path: /proc
      - name: utmp
        hostPath:
          path: /var/run/utmp
      - name: config
        configMap:
          name: telegraf

This is the article that I followed to create a daemonset.

Here is the pods:

NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
default       nginx-65f88748fd-jztrz               1/1     Running   0          7d18h
kube-system   coredns-fb8b8dccf-rl48l              1/1     Running   0          7d18h
kube-system   coredns-fb8b8dccf-x8fvx              1/1     Running   0          7d18h
kube-system   etcd-k8s-master                      1/1     Running   2          7d18h
kube-system   kube-apiserver-k8s-master            1/1     Running   2          7d18h
kube-system   kube-controller-manager-k8s-master   1/1     Running   0          7d18h
kube-system   kube-flannel-ds-amd64-96tsl          1/1     Running   0          7d18h
kube-system   kube-flannel-ds-amd64-b884r          1/1     Running   0          7d18h
kube-system   kube-flannel-ds-amd64-pdqmq          1/1     Running   0          7d18h
kube-system   kube-proxy-42k2g                     1/1     Running   0          7d18h
kube-system   kube-proxy-77pw9                     1/1     Running   0          7d18h
kube-system   kube-proxy-n5mbs                     1/1     Running   0          7d18h
kube-system   kube-scheduler-k8s-master            1/1     Running   2          7d18h
monitoring    telegraf-dvtcl                       1/1     Running   5          117m
monitoring    telegraf-n2mqz                       1/1     Running   5          117m

tcpdump shows that something sent from the daemonset:

09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......

y?...#..?5]......n(._.........
        z#0........................
09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......

y?...#..?5]......n(._.........

But I can't see anything on our grafana dashboard. If I install a standalone rpm based telegraf on the nodes, it sents out and I can see the metrics. But I'm curious of the pod metrics.

-- Badb0y
apache-kafka
kubernetes
metrics
telegraf

1 Answer

8/21/2019

This error from Telegraf just means there is no connection made to your 10 class ip range of brokers in your broker array in the config. Depending on how you setup networking and routing you may just have a simple routing issue to those private IPs that have your Kafka cluster.

-- jordo1138
Source: StackOverflow