We are using Prometheus for metrics collection. Prometheus will be deployed as a container and it will collect metrics from various sources and stores the data in the local machine (where the container is running). If the node which holds the container failed we are losing metrics along with that node as Prometheus stored all metrics in that local machine. Kubernetes will detect container failure and span that container in a healthy node but we have lost data in the old node.
To solve this issue we have come with two ideas either
we have to decouple the whole Prometheus from Kubernetes.
we have to decouple the storage alone eg: NFS like protocol (PV in Kubernetes term).
Which one should we use?
If any other industry solution exists share that too. If any of the above has unmentioned side effects kindly let me know.
Prometheus can replicate data to remote storage via remote_write API. This means that the data isn't lost on Prometheus pod restart in k8s clustet, since it is already replicated to remote storage. See how to set up remote storage in Prometheus. This example uses VictoriaMetrics - cost efficient open source remote storage for Prometheus with the following features:
I wouldn't recommend Thanos, since it doesn't prevent from data loss for the recent 2 hours on Prometheus pod restarts. See this article for details.
There is also another option in addition to storing the prometheus data as Persistence Volume(PV). You can use exporters
by prometheus as mentioned here. These exporters get the scraped data and stores them in some external db like elastic search or mysql and can be used another prometheus instance in case the previous prometheus instance crashed.
Short term answer use a PV, but probably not NFS, you don’t need multiple writers. A simple network block device (EBS, GCPDisk, etc) is fine. Long term, HA Prometheus is a complex topic, check out the Thanos project for some ideas and tech. Also the Grafana Labs folks have been experimenting with some new HA layouts for it. Expect full HA Prometheus to be a very substantial project requiring you to dive deep into the internals.