I have a kubernetes cluster(built with Typhoon module) and a Prometheus instance in different VPC(running on docker-compose, not on Kubernetes cluster). I have the vpc peering connection enabled and required ports are open to this vpc. All the metrics are being scraped as expected except for coredns pod. The issue here is coredns pods are assigned with 10.2.. IP which is different from my IP range configured for the pods to run.
If coredns pod gets the IP 172...*, my prometheus will be able to resolve it and the scraping will be successful.
Now, I'm not sure how to scrape this metrics. Please let me know if you know what am I doing wrong.
$ kubectl get pods -n kube-system -o wide | grep coredns
coredns-7d8995c4cd-4l4ft 1/1 Running 1 7d1h 10.2.5.2 ip-172-*-*-* <none> <none>
coredns-7d8995c4cd-vxd9d 1/1 Running 1 6d3h 10.2.3.9 ip-172-*-*-* <none> <none>
Prometheus.yml file is configured with the below job.
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
api_server: https://kubernetes-cluster:6443
tls_config:
insecure_skip_verify: true
bearer_token: "TOKEN"
bearer_token: "TOKEN"
honor_labels: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: job
metric_relabel_configs:
- source_labels: [__name__]
action: drop
regex: etcd_(debugging|disk|request|server).*
P.S: I'm using Flannel as my network CNI so that I get the pods created with the IP of the host network itself.
Updated Info: I tried deploying the prometheus on kubernetes and trying to federate this data to my prometheus docker as suggested by Yaron.
I'm trying the below config for the federation but not seeing any metrics loaded to my target prometheus.
- job_name: 'federate'
scrape_interval: 10s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{job="kubernetes-nodes"}'
- '{job="kubernetes-apiservers"}'
- '{job="kubernetes-service-endpoints"}'
- '{job="kubernetes-cadvisor"}'
- '{job="kubelet"}'
- '{job="etcd"}'
- '{job="kubernetes-services"}'
- '{job="kubernetes-pods"}'
scheme: https
static_configs:
- targets:
- prom.mycompany.com
The best practice for solving this issue is running a prometheus instance inside the cluster running Coredns, and federating the metrics scraped by that prometheus into your external prometheus running with docker-compose.
You can read more about federation here, to get an idea of how to start leveraging it.
A more advanced use case would be using Thanos to better distribute queries across your different prometheus servers, but the main point remains running an internal prometheus server within each of your clusters.