Why is prometheus operator not able to start

3/14/2019

I'm trying to create prometheus with operator in fresh new k8s cluster I use the following files ,

  1. I’m creating a namespace monitoring
  2. Apply this file , which works ok
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  labels:
    k8s-app: prometheus-operator
  name: prometheus-operator
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      k8s-app: prometheus-operator
  template:
    metadata:
      labels:
        k8s-app: prometheus-operator
    spec:
      priorityClassName: "operator-critical"
      tolerations:
      - key: "WorkGroup"
        operator: "Equal"
        value: "operator"
        effect: "NoSchedule"
      - key: "WorkGroup"
        operator: "Equal"
        value: "operator"
        effect: "NoExecute"
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --logtostderr=true
        - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0
        image: quay.io/coreos/prometheus-operator:v0.29.0
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
      nodeSelector:
      serviceAccountName: prometheus-operator

Now I want to apply this file (CRD)

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
  labels: 
    prometheus: prometheus
spec:
  replica: 1
  priorityClassName: "operator-critical"
  serviceAccountName: prometheus
  nodeSelector:
        worker.garden.sapcloud.io/group: operator
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      role: observeable
  tolerations:
  - key: "WorkGroup"
    operator: "Equal"
    value: "operator"
    effect: "NoSchedule"
  - key: "WorkGroup"
    operator: "Equal"
    value: "operator"
    effect: "NoExecute"

before I've created those CRD

https://github.com/coreos/prometheus-operator/tree/master/example/prometheus-operator-crd

The problem that the pods didn't able to start (0/2), see the picture below. What could be the problem? please advice

enter image description here

update

when I go to the event of the prom operator I see the following Error creating: pods "prometheus-operator-6944778645-" is forbidden: no PriorityClass with name operator-critical was found replicaset-controller , any idea ?

-- Jenny M
google-cloud-platform
google-kubernetes-engine
kubernetes
prometheus
prometheus-operator

2 Answers

3/15/2019

You are trying to reference the operator-critical priority class. Priority classes determine the priority of pods and their resource assignment.

To fix this issue you could either remove the explicit priority class(priorityClassName: "operator-critical") in both files or create the operator-critical class:

apiVersion: scheduling.k8s.io/v1beta1
kind: PriorityClass
metadata:
  name: operator-critical
value: 1000000
globalDefault: false
description: "Critical operator workloads"
-- Lukas Eichler
Source: StackOverflow

3/14/2019

Prometheus and alert manager pods need persistent volume to store the data. Make sure those pv's are present and are bound to the respective pods. Alternatively you can make those pods ephemeral. It should work

-- P Ekambaram
Source: StackOverflow