Kubernetes / Prometheus Metrics Mismatch

5/15/2019

I have an application running in Kubernetes (Azure AKS) in which each pod contains two containers. I also have Grafana set up to display various metrics some of which are coming from Prometheus. I'm trying to troubleshoot a separate issue and in doing so I've noticed that some metrics don't appear to match up between data sources.

For example, kube_deployment_status_replicas_available returns the value 30 whereas kubectl -n XXXXXXXX get pod lists 100 all of which are Running, and kube_deployment_status_replicas_unavailable returns a value of 0. Also, if I get the deployment in question using kubectl I'm seeing the expected value.

$ kubectl get deployment XXXXXXXX
NAME       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
XXXXXXXX   100       100       100          100         49d

There are other applications (namespaces) in the same cluster where all the values correlate correctly so I'm not sure where the fault may be or if there's any way to know for sure which value is the correct one. Any guidance would be appreciated. Thanks

-- olliefinn
kubernetes
prometheus

1 Answer

5/16/2019

Based on having the kube_deployment_status_replicas_available metric I assume that you have Prometheus scraping your metrics from kube-state-metrics. It sounds like there's something quirky about its deployment. It could be:

  • Cached metric data
  • And/or simply it can't pull current metrics from the kube-apiserver

I would:

  • Check the version that you are running for kube-state-metrics and see if it's compatible with your K8s version.
  • Restart the kube-state-metrics pod.
  • Check the logs kubectl logskube-state-metrics`
  • Check the Prometheus logs
    • If you don't see anything try starting Prometheus with the --log.level=debug flag.

Hope it helps.

-- Rico
Source: StackOverflow