Prometheus storage persistent understanding when installing from [stable/prometheus] on kubernets/eks with Grafana

4/14/2020

Problem: The Prometheus server queries result on the Grafana side is not consistent as it should be when we have HA of Prometheus server. So I want to achieve consistent storage access with High Availablity.

Scenario-1 RelicaCount=1 statefulSet=false Created PersitantVolume Count = 1 The application works as expected but not HA.

Scenerio-2 ReplicaCount=3 statefulSet=false Created PersistantVolume Count=1 In this scenario, only one pod is running and other ones are throwing lock error from the Prometheus server.

Scenario-2.1

ReplicaCount=3 statefulSet=false Created PersistantVolume Count=1 with "--storage.tsdb.no-lockfile"

it does create 3 pods but except 1 all others throwing some golang error from Prometheus server application.

Scenario-3 ReplicaCount=3 statefulSet=true Created PersistantVolume Count=3

So in this scenario, we have 3 replica pods with 3 separate persistent storage through the stateful set, Which is recommended configuration from the community. But this configuration not giving consistent metrics on Grafana as the session is sticky.

Infrastructure information: helm chart: "stable/prometheus" version: 10.0.1 appVersion: 2.15.2 Aws EKS with Kubernetes 1.14

Question: So how I can achieve HA of Prometheus server with the plan HPA or VPA with persistent storage on Kubernetes.

Resolution: I am thinking about handling this problem with below-mentioned resolution:

  1. use deployment replica for pods with single persistent storage but it's not working, or maybe my configuration is incorrect to handle this.
  2. expose the Prometheus server with an application load balancer with sticky session and use ALB URL in Grafana to get persistent query results. but still, I believe it would behave differently for different pods requests.

Has anyone faced this issue before, if yes how you overcome with HA and persistent storage on Kubernetes? If my configuration is incorrect to handle this requirement, Can you provide a recommended approach and configuration to handle this?

Please feel to ask if anything missing to explain in my current implementation, Thanks in advance.

-- Safoor Safdar
aws-eks
grafana
kubernetes
prometheus

0 Answers