kubernetes pod failed with Back-off restarting failed container

4/28/2020

I am trying to setup prometheus logging, I am trying to deploy below yamls but pod is failed with "Back-off restarting failed container"

Complete description:

Name:         prometheus-75dd748df4-wrwlr
Namespace:    monitoring
Priority:     0
Node:         kbs-vm-02/172.16.1.8
Start Time:   Tue, 28 Apr 2020 06:13:22 +0000
Labels:       app=prometheus
              pod-template-hash=75dd748df4
Annotations:  <none>
Status:       Running
IP:           10.44.0.7
IPs:
  IP:           10.44.0.7
Controlled By:  ReplicaSet/prometheus-75dd748df4
Containers:
  prom:
    Container ID:  docker://50fb273836c5522bbbe01d8db36e18688e0f673bc54066f364290f0f6854a74f
    Image:         quay.io/prometheus/prometheus:v2.4.3
    Image ID:      docker-pullable://quay.io/prometheus/prometheus@sha256:8e0e85af45fc2bcc18bd7221b8c92fe4bb180f6bd5e30aa2b226f988029c2085
    Port:          9090/TCP
    Host Port:     0/TCP
    Args:
      --config.file=/prometheus-cfg/prometheus.yml
      --storage.tsdb.path=/data
      --storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 28 Apr 2020 06:14:08 +0000
      Finished:     Tue, 28 Apr 2020 06:14:08 +0000
    Ready:          False
    Restart Count:  3
    Limits:
      memory:  1Gi
    Requests:
      cpu:     200m
      memory:  500Mi
    Environment Variables from:
      prometheus-config-flags  ConfigMap  Optional: false
    Environment:               <none>
    Mounts:
      /data from storage (rw)
      /prometheus-cfg from config-file (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-bt7dw (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-file:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-config-file
    Optional:  false
  storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  prometheus-storage-claim
    ReadOnly:   false
  prometheus-token-bt7dw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-token-bt7dw
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From                Message
  ----     ------            ----               ----                -------
  Warning  FailedScheduling  76s (x3 over 78s)  default-scheduler   running "VolumeBinding" filter plugin for pod "prometheus-75dd748df4-wrwlr": pod has unbound immediate PersistentVolumeClaims
  Normal   Scheduled         73s                default-scheduler   Successfully assigned monitoring/prometheus-75dd748df4-wrwlr to kbs-vm-02
  Normal   Pulled            28s (x4 over 72s)  kubelet, kbs-vm-02  Container image "quay.io/prometheus/prometheus:v2.4.3" already present on machine
  Normal   Created           28s (x4 over 72s)  kubelet, kbs-vm-02  Created container prom
  Normal   Started           27s (x4 over 71s)  kubelet, kbs-vm-02  Started container prom
  Warning  BackOff           13s (x6 over 69s)  kubelet, kbs-vm-02  Back-off restarting failed container

deployment file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      securityContext:
        fsGroup: 1000
      serviceAccountName: prometheus
      containers:
      - image: quay.io/prometheus/prometheus:v2.4.3
        name: prom
        args:
        - '--config.file=/prometheus-cfg/prometheus.yml'
        - '--storage.tsdb.path=/data'
        - '--storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)'
        envFrom:
        - configMapRef:
            name: prometheus-config-flags
        ports:
        - containerPort: 9090
          name: prom-port
        resources:
          limits:
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 500Mi
        volumeMounts:
        - name: config-file
          mountPath: /prometheus-cfg
        - name: storage
          mountPath: /data
      volumes:
      - name: config-file
        configMap:
          name: prometheus-config-file
      - name: storage
        persistentVolumeClaim:
          claimName: prometheus-storage-claim

PV Yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-storage
  namespace: monitoring
  labels:
    app: prometheus
spec:
  capacity:
    storage: 12Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data"

PVC Yaml data:

[vidya@KBS-VM-01 7-1_prometheus]$ cat prometheus/prom-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-storage-claim
  namespace: monitoring
  labels:
    app: prometheus
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Do you know what is the issue and how to fix it. Please also let me know any more files need to be share,

My Guess is something problem with storage configs, seeing at events logs

Warning FailedScheduling 76s (x3 over 78s) default-scheduler running "VolumeBinding" filter plugin for pod "prometheus-75dd748df4-wrwlr": pod has unbound immediate PersistentVolumeClaims

I am using local storage.

[vidya@KBS-VM-01 7-1_prometheus]$ kubectl describe pvc prometheus-storage-claim -n monitoring
Name:          prometheus-storage-claim
Namespace:     monitoring
StorageClass:
Status:        Bound
Volume:        prometheus-storage
Labels:        app=prometheus
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      12Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    prometheus-75dd748df4-wrwlr
Events:
  Type    Reason         Age   From                         Message
  ----    ------         ----  ----                         -------
  Normal  FailedBinding  37m   persistentvolume-controller  no persistent volumes available for this claim and no storage class is set



[vidya@KBS-VM-01 7-1_prometheus]$ kubectl logs prometheus-75dd748df4-zlncv -n monitoring
level=info ts=2020-04-28T07:49:07.885529914Z caller=main.go:238 msg="Starting Prometheus" version="(version=2.4.3, branch=HEAD, revision=167a4b4e73a8eca8df648d2d2043e21bdb9a7449)"
level=info ts=2020-04-28T07:49:07.885635014Z caller=main.go:239 build_context="(go=go1.11.1, user=root@1e42b46043e9, date=20181004-08:42:02)"
level=info ts=2020-04-28T07:49:07.885812014Z caller=main.go:240 host_details="(Linux 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 prometheus-75dd748df4-zlncv (none))"
level=info ts=2020-04-28T07:49:07.885833214Z caller=main.go:241 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-04-28T07:49:07.885849614Z caller=main.go:242 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-04-28T07:49:07.888695413Z caller=main.go:554 msg="Starting TSDB ..."
level=info ts=2020-04-28T07:49:07.889017612Z caller=main.go:423 msg="Stopping scrape discovery manager..."
level=info ts=2020-04-28T07:49:07.889033512Z caller=main.go:437 msg="Stopping notify discovery manager..."
level=info ts=2020-04-28T07:49:07.889041112Z caller=main.go:459 msg="Stopping scrape manager..."
level=info ts=2020-04-28T07:49:07.889048812Z caller=main.go:433 msg="Notify discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889071612Z caller=main.go:419 msg="Scrape discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889083112Z caller=main.go:453 msg="Scrape manager stopped"
level=info ts=2020-04-28T07:49:07.889098012Z caller=manager.go:638 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-04-28T07:49:07.889109912Z caller=manager.go:644 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-04-28T07:49:07.889124912Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..."
level=info ts=2020-04-28T07:49:07.889137812Z caller=main.go:608 msg="Notifier manager stopped"
level=info ts=2020-04-28T07:49:07.889169012Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=error ts=2020-04-28T07:49:07.889653412Z caller=main.go:617 err="opening storage failed: lock DB directory: open /data/lock: permission denied"
-- Vidya
docker
grafana
kubernetes
prometheus

1 Answer

4/28/2020

The problem here is pvc is not bound to the pv primarily because there is no storage class to link the pv with pvc and the capacity in pv(12Gi) and requests in pvc(10Gi) is not matching. So at the end kubernetes could not figure out which pv the pvc should be bound to.

  1. Add storageClassName: manual in spec of both PV and PVC.
  2. Make the capacity in PV and requests in PVC same i.e 10Gi

PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-storage
  namespace: monitoring
  labels:
    app: prometheus
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data"

PVC

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-storage-claim
  namespace: monitoring
  labels:
    app: prometheus
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Update:

Running the pod as root by adding runAsUser: 0 should solve the open /data/lock: permission denied error

-- Arghya Sadhu
Source: StackOverflow