We are facing one issue, that some endpoints are in state "UNKNOWN". Prometheus job "kubernetes-nodes".
Nodes and Prometheus are all up for several days. We tried to curl those "kubernetes-nodes" endpoints, which are in "UNKNOWN" state. Metrics can be correctly curled, but endpoint state is still "UNKNOWN". We don't know the reason (criteria, on which case it will be marked as "UNKNOWN").
I know before Prometheus does its first scrape, endpoints are in "UNKNOWN" state. Then, if scrape successes, endpoint will be "UP", if fails, "DOWN". However, in below screenshot it seems some endpoints are never been scraped...We just don't know why.
Could you please give advice, about the possible reason of such case? Does this mean this node (name hide in red block...) has something wrong? If so, is it possible to fix, that will let Prometheus treat it as "UP"?
Thanks in advance.
- job_name: kubernetes-nodes
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- api_server: null
role: node
namespaces:
names: []
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
I think you're missing nodes/proxy
resource in your prometheus cluster role. Here's official example github.com/prometheus/documentation/examples/rbac-setup.yml.