I have been deploy a prometheus outsite of kubernetes Cluser. And I want to monitoring kubernetes with it. unfortunately, I encounter a lot of problem.
Such as:
Here is my deploy script:
docker run -it -d --name prometheus -p 9090:9090 \
--user 1000 \
-v /home/prometheus/prometheus:/etc/prometheus/ \
-v /home/prometheus/data:/prometheus \
quay.io/prometheus/prometheus:v2.20.1
and here is my prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: kubernetes-apiservers
metrics_path: /metrics
# metrics_path: /
scrape_interval: 10s
# scrape_interval: 1m
scrape_timeout: 10s
scheme: https
tls_config:
insecure_skip_verify: true
# ca_file: /etc/prometheus/ca.crt
kubernetes_sd_configs:
- api_server: https://192.168.0.146:6443
role: endpoints
bearer_token_file: /etc/prometheus/prome.token
tls_config:
insecure_skip_verify: true
# namespaces:
# names: []
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
regex: default;kubernetes;https
replacement: $1
action: keep
- job_name: kubernetes-nodes
metrics_path: /metrics
scrape_interval: 10s
scrape_timeout: 10s
scheme: https
tls_config:
insecure_skip_verify: true
# ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
kubernetes_sd_configs:
- api_server: https://192.168.0.146:6443
role: node
bearer_token_file: /etc/prometheus/prome.token
tls_config:
insecure_skip_verify: true
# ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
namespaces:
names: []
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
action: replace
Also, I check the prometheus log, and found nothing suspect:
[root@company-server-121 prometheus]# docker logs -f prometheus --tail 100
level=info ts=2020-09-04T02:55:49.571Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599170400000 maxt=1599177600000 ulid=01EHBA1FG79CSJGAEFKEBKX8WA
level=info ts=2020-09-04T02:55:49.577Z caller=head.go:641 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-09-04T02:55:49.578Z caller=head.go:655 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=1.465365ms
level=info ts=2020-09-04T02:55:49.579Z caller=head.go:661 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-09-04T02:55:49.583Z caller=head.go:687 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2020-09-04T02:55:49.613Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=35 maxSegment=39
level=info ts=2020-09-04T02:55:49.644Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=36 maxSegment=39
level=info ts=2020-09-04T02:55:49.676Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=37 maxSegment=39
level=info ts=2020-09-04T02:55:49.703Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=38 maxSegment=39
level=info ts=2020-09-04T02:55:49.704Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=39 maxSegment=39
level=info ts=2020-09-04T02:55:49.704Z caller=head.go:716 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=4.494795ms wal_replay_duration=121.056294ms total_replay_duration=127.049792ms
level=info ts=2020-09-04T02:55:49.707Z caller=main.go:700 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-09-04T02:55:49.707Z caller=main.go:701 msg="TSDB started"
level=info ts=2020-09-04T02:55:49.707Z caller=main.go:805 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T02:55:49.708Z caller=main.go:833 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T02:55:49.708Z caller=main.go:652 msg="Server is ready to receive web requests."
level=info ts=2020-09-04T03:00:01.264Z caller=compact.go:495 component=tsdb msg="write block" mint=1599177600000 maxt=1599184800000 ulid=01EHBGWZ2SH6X5N3JK6T4NRM2Z duration=23.544803ms
level=info ts=2020-09-04T03:00:01.269Z caller=head.go:804 component=tsdb msg="Head GC completed" duration=1.195693ms
level=info ts=2020-09-04T03:00:01.269Z caller=checkpoint.go:96 component=tsdb msg="Creating checkpoint" from_segment=35 to_segment=37 mint=1599184800000
level=info ts=2020-09-04T03:00:01.302Z caller=head.go:884 component=tsdb msg="WAL checkpoint complete" first=35 last=37 duration=32.653413ms
level=info ts=2020-09-04T03:00:01.328Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1599156000000 maxt=1599177600000 ulid=01EHBGWZ4S5F3GJ259C7070CYK sources="[01EHAWA1070CVB7X1BZ8CE9TXH 01EHB35R87CQVVJYNHBB2T76G6 01EHBA1FG79CSJGAEFKEBKX8WA]" duration=23.473664ms
level=warn ts=2020-09-04T03:38:34.452Z caller=main.go:530 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:553 msg="Stopping scrape discovery manager..."
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:567 msg="Stopping notify discovery manager..."
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:589 msg="Stopping scrape manager..."
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:549 msg="Scrape discovery manager stopped"
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:563 msg="Notify discovery manager stopped"
level=info ts=2020-09-04T03:38:34.453Z caller=manager.go:888 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-09-04T03:38:34.453Z caller=manager.go:898 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:583 msg="Scrape manager stopped"
level=info ts=2020-09-04T03:38:34.454Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
level=info ts=2020-09-04T03:38:34.454Z caller=main.go:755 msg="Notifier manager stopped"
level=info ts=2020-09-04T03:38:34.454Z caller=main.go:767 msg="See you next time!"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:308 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:343 msg="Starting Prometheus" version="(version=2.20.1, branch=HEAD, revision=983ebb4a513302315a8117932ab832815f85e3d2)"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:344 build_context="(go=go1.14.6, user=root@7cbd4d1c15e0, date=20200805-17:26:58)"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:345 host_details="(Linux 4.14.15-1.el7.elrepo.x86_64 #1 SMP Tue Jan 23 20:28:26 EST 2018 x86_64 34ba7bdc34ce (none))"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:346 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:347 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-09-04T03:38:34.954Z caller=web.go:524 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-09-04T03:38:34.954Z caller=main.go:684 msg="Starting TSDB ..."
level=info ts=2020-09-04T03:38:34.955Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599023110806 maxt=1599069600000 ulid=01EH8GS198RCSZBAPC1Z7P629X
level=info ts=2020-09-04T03:38:34.956Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599069600000 maxt=1599134400000 ulid=01EHA7PVAGDKSJTEJE1572FC7J
level=info ts=2020-09-04T03:38:34.956Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599134400000 maxt=1599156000000 ulid=01EHAWA116KSFTR969PJ5MKAQ2
level=info ts=2020-09-04T03:38:34.956Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599177600000 maxt=1599184800000 ulid=01EHBGWZ2SH6X5N3JK6T4NRM2Z
level=info ts=2020-09-04T03:38:34.956Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599156000000 maxt=1599177600000 ulid=01EHBGWZ4S5F3GJ259C7070CYK
level=info ts=2020-09-04T03:38:34.963Z caller=head.go:641 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-09-04T03:38:34.964Z caller=head.go:655 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=899.254µs
level=info ts=2020-09-04T03:38:34.964Z caller=head.go:661 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-09-04T03:38:34.967Z caller=head.go:687 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2020-09-04T03:38:34.997Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=38 maxSegment=41
level=info ts=2020-09-04T03:38:35.002Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=39 maxSegment=41
level=info ts=2020-09-04T03:38:35.029Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=40 maxSegment=41
level=info ts=2020-09-04T03:38:35.029Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=41 maxSegment=41
level=info ts=2020-09-04T03:38:35.029Z caller=head.go:716 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=2.805909ms wal_replay_duration=62.637276ms total_replay_duration=66.40127ms
level=info ts=2020-09-04T03:38:35.031Z caller=main.go:700 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-09-04T03:38:35.031Z caller=main.go:701 msg="TSDB started"
level=info ts=2020-09-04T03:38:35.031Z caller=main.go:805 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T03:38:35.032Z caller=main.go:833 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T03:38:35.032Z caller=main.go:652 msg="Server is ready to receive web requests."
level=warn ts=2020-09-04T03:47:13.411Z caller=main.go:530 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:553 msg="Stopping scrape discovery manager..."
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:567 msg="Stopping notify discovery manager..."
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:589 msg="Stopping scrape manager..."
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:549 msg="Scrape discovery manager stopped"
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:563 msg="Notify discovery manager stopped"
level=info ts=2020-09-04T03:47:13.412Z caller=manager.go:888 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-09-04T03:47:13.412Z caller=manager.go:898 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-09-04T03:47:13.412Z caller=main.go:583 msg="Scrape manager stopped"
level=info ts=2020-09-04T03:47:13.412Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
level=info ts=2020-09-04T03:47:13.412Z caller=main.go:755 msg="Notifier manager stopped"
level=info ts=2020-09-04T03:47:13.412Z caller=main.go:767 msg="See you next time!"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:308 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:343 msg="Starting Prometheus" version="(version=2.20.1, branch=HEAD, revision=983ebb4a513302315a8117932ab832815f85e3d2)"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:344 build_context="(go=go1.14.6, user=root@7cbd4d1c15e0, date=20200805-17:26:58)"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:345 host_details="(Linux 4.14.15-1.el7.elrepo.x86_64 #1 SMP Tue Jan 23 20:28:26 EST 2018 x86_64 34ba7bdc34ce (none))"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:346 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:347 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-09-04T03:47:13.941Z caller=web.go:524 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-09-04T03:47:13.941Z caller=main.go:684 msg="Starting TSDB ..."
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599023110806 maxt=1599069600000 ulid=01EH8GS198RCSZBAPC1Z7P629X
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599069600000 maxt=1599134400000 ulid=01EHA7PVAGDKSJTEJE1572FC7J
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599134400000 maxt=1599156000000 ulid=01EHAWA116KSFTR969PJ5MKAQ2
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599177600000 maxt=1599184800000 ulid=01EHBGWZ2SH6X5N3JK6T4NRM2Z
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599156000000 maxt=1599177600000 ulid=01EHBGWZ4S5F3GJ259C7070CYK
level=info ts=2020-09-04T03:47:13.948Z caller=head.go:641 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-09-04T03:47:13.949Z caller=head.go:655 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=727.383µs
level=info ts=2020-09-04T03:47:13.949Z caller=head.go:661 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-09-04T03:47:13.952Z caller=head.go:687 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2020-09-04T03:47:13.983Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=38 maxSegment=42
level=info ts=2020-09-04T03:47:13.987Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=39 maxSegment=42
level=info ts=2020-09-04T03:47:14.017Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=40 maxSegment=42
level=info ts=2020-09-04T03:47:14.023Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=41 maxSegment=42
level=info ts=2020-09-04T03:47:14.023Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=42 maxSegment=42
level=info ts=2020-09-04T03:47:14.023Z caller=head.go:716 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=2.967015ms wal_replay_duration=71.121903ms total_replay_duration=74.856822ms
level=info ts=2020-09-04T03:47:14.025Z caller=main.go:700 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-09-04T03:47:14.025Z caller=main.go:701 msg="TSDB started"
level=info ts=2020-09-04T03:47:14.025Z caller=main.go:805 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T03:47:14.026Z caller=main.go:833 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T03:47:14.026Z caller=main.go:652 msg="Server is ready to receive web requests."
Hope to get your help, thanks.
Yes, you can monitor Kubernetes from outside, but it is strongly advised against. Prometheus best works when in k8s cluster. A cleaner solution is to have Prometheus + thanos/cortana in the cluster and use a secondary, central Prometheus to monitor all.
To resolve the issue you have you need to give the Cert also, have a look at : https://github.com/prometheus/prometheus/blob/release-2.20/documentation/examples/prometheus-kubernetes.yml