I am new to prometheus/alertmanager.
I have created a cron job which executes shell script every minute. This shell script generates "test.prom" file (with a gauge metric in it) in the same directory which is assigned to --textfile.collector.directory
argument (to node-exporter). I verified (using curl http://localhost:9100/metrics) that the node-exporter exposes that custom metric correctly.
When I tried to run a query against that custom metric in prometheus dashboard, it does not show up any results (it says no data found).
I could not figure out why the query against the metric exposed via node-exporter textfile collector fails. Any clues what I missed ? Also please let me know how to check and ensure that prometheus scraped my custom metric 'test_metric` ?
My query in prometheus dashboard is test_metric != 0
(in prometheus dashboard) which did not give any results. But I exposed test_metric
via node-exporter textfile.
Any help is appreciated !!
BTW, the node-exporter is running as docker container in Kubernetes environment.
Its my bad. I did not included scrape instructions for node-exporter in prometheus.yaml file. It worked after including them.
I had a similar situation, but it was not a configuration problem.
Instead, my data included timestamps:
# HELP network_connectivity_rtt Round Trip Time to each node
# TYPE network_connectivity_rtt gauge
network_connectivity_rtt{host="home"} 53.87 1541426242
network_connectivity_rtt{host="hop_1"} 58.8 1541426242
network_connectivity_rtt{host="hop_2"} 21.93 1541426242
network_connectivity_rtt{host="hop_3"} 71.69 1541426242
PNE was picking them up without any problem once I reloaded it. As prometheus is running under systemd, I had to check the logs like this:
journalctl --system -u prometheus.service --follow
There I read this line:
msg="Error on ingesting samples that are too old or are too far into the future"
Once I removed the timestamps, values started appearing. This lead me to read more in detail about the timestamps, and I found out they have to be in miliseconds. So this format now is ok:
# HELP network_connectivity_rtt Round Trip Time to each node
# TYPE network_connectivity_rtt gauge
network_connectivity_rtt{host="home"} 50.47 1541429581376
network_connectivity_rtt{host="hop_1"} 3.38 1541429581376
network_connectivity_rtt{host="hop_2"} 11.2 1541429581376
network_connectivity_rtt{host="hop_3"} 20.72 1541429581376
I hope it helps someone else.
This issue is happening because of stale metrics. Lets say you have written you metric in file at 13.00 by default after 5min prometheus will consider you metric stale and it might disappear from there at the time you are making query.