Scrape CoreDNS metrics with External Prometheus

9/27/2020

I have a kubernetes cluster(built with Typhoon module) and a Prometheus instance in different VPC(running on docker-compose, not on Kubernetes cluster). I have the vpc peering connection enabled and required ports are open to this vpc. All the metrics are being scraped as expected except for coredns pod. The issue here is coredns pods are assigned with 10.2.. IP which is different from my IP range configured for the pods to run.

If coredns pod gets the IP 172...*, my prometheus will be able to resolve it and the scraping will be successful.

Now, I'm not sure how to scrape this metrics. Please let me know if you know what am I doing wrong.

$ kubectl get pods -n kube-system -o wide | grep coredns

coredns-7d8995c4cd-4l4ft                   1/1     Running   1          7d1h    10.2.5.2        ip-172-*-*-*   <none>           <none>
coredns-7d8995c4cd-vxd9d                   1/1     Running   1          6d3h    10.2.3.9        ip-172-*-*-*   <none>           <none>

Prometheus.yml file is configured with the below job.

  - job_name: 'kubernetes-service-endpoints'
    kubernetes_sd_configs:
      - role: endpoints
        api_server: https://kubernetes-cluster:6443
        tls_config:
          insecure_skip_verify: true
        bearer_token: "TOKEN"

    bearer_token: "TOKEN"

    honor_labels: true
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: job

    metric_relabel_configs:
      - source_labels: [__name__]
        action: drop
        regex: etcd_(debugging|disk|request|server).*

P.S: I'm using Flannel as my network CNI so that I get the pods created with the IP of the host network itself.

Updated Info: I tried deploying the prometheus on kubernetes and trying to federate this data to my prometheus docker as suggested by Yaron.

I'm trying the below config for the federation but not seeing any metrics loaded to my target prometheus.

  - job_name: 'federate'
    scrape_interval: 10s

    honor_labels: true
    metrics_path: '/federate'

    params:
      'match[]':
        - '{job="prometheus"}'
        - '{job="kubernetes-nodes"}'
        - '{job="kubernetes-apiservers"}'
        - '{job="kubernetes-service-endpoints"}'
        - '{job="kubernetes-cadvisor"}'
        - '{job="kubelet"}'
        - '{job="etcd"}'
        - '{job="kubernetes-services"}'
        - '{job="kubernetes-pods"}'
    scheme: https
    static_configs:
    - targets:
      - prom.mycompany.com
-- Vamshi Siddarth
coredns
flannel
kubernetes
prometheus

1 Answer

9/27/2020

The best practice for solving this issue is running a prometheus instance inside the cluster running Coredns, and federating the metrics scraped by that prometheus into your external prometheus running with docker-compose.

You can read more about federation here, to get an idea of how to start leveraging it.

A more advanced use case would be using Thanos to better distribute queries across your different prometheus servers, but the main point remains running an internal prometheus server within each of your clusters.

-- Yaron Idan
Source: StackOverflow