I am trying to setup prometheus logging, I am trying to deploy below yamls but pod is failed with "Back-off restarting failed container"
Name: prometheus-75dd748df4-wrwlr
Namespace: monitoring
Priority: 0
Node: kbs-vm-02/172.16.1.8
Start Time: Tue, 28 Apr 2020 06:13:22 +0000
Labels: app=prometheus
pod-template-hash=75dd748df4
Annotations: <none>
Status: Running
IP: 10.44.0.7
IPs:
IP: 10.44.0.7
Controlled By: ReplicaSet/prometheus-75dd748df4
Containers:
prom:
Container ID: docker://50fb273836c5522bbbe01d8db36e18688e0f673bc54066f364290f0f6854a74f
Image: quay.io/prometheus/prometheus:v2.4.3
Image ID: docker-pullable://quay.io/prometheus/prometheus@sha256:8e0e85af45fc2bcc18bd7221b8c92fe4bb180f6bd5e30aa2b226f988029c2085
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/prometheus-cfg/prometheus.yml
--storage.tsdb.path=/data
--storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 28 Apr 2020 06:14:08 +0000
Finished: Tue, 28 Apr 2020 06:14:08 +0000
Ready: False
Restart Count: 3
Limits:
memory: 1Gi
Requests:
cpu: 200m
memory: 500Mi
Environment Variables from:
prometheus-config-flags ConfigMap Optional: false
Environment: <none>
Mounts:
/data from storage (rw)
/prometheus-cfg from config-file (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-bt7dw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-file:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-config-file
Optional: false
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-storage-claim
ReadOnly: false
prometheus-token-bt7dw:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-bt7dw
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 76s (x3 over 78s) default-scheduler running "VolumeBinding" filter plugin for pod "prometheus-75dd748df4-wrwlr": pod has unbound immediate PersistentVolumeClaims
Normal Scheduled 73s default-scheduler Successfully assigned monitoring/prometheus-75dd748df4-wrwlr to kbs-vm-02
Normal Pulled 28s (x4 over 72s) kubelet, kbs-vm-02 Container image "quay.io/prometheus/prometheus:v2.4.3" already present on machine
Normal Created 28s (x4 over 72s) kubelet, kbs-vm-02 Created container prom
Normal Started 27s (x4 over 71s) kubelet, kbs-vm-02 Started container prom
Warning BackOff 13s (x6 over 69s) kubelet, kbs-vm-02 Back-off restarting failed container
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
strategy:
type: Recreate
template:
metadata:
labels:
app: prometheus
spec:
securityContext:
fsGroup: 1000
serviceAccountName: prometheus
containers:
- image: quay.io/prometheus/prometheus:v2.4.3
name: prom
args:
- '--config.file=/prometheus-cfg/prometheus.yml'
- '--storage.tsdb.path=/data'
- '--storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)'
envFrom:
- configMapRef:
name: prometheus-config-flags
ports:
- containerPort: 9090
name: prom-port
resources:
limits:
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
volumeMounts:
- name: config-file
mountPath: /prometheus-cfg
- name: storage
mountPath: /data
volumes:
- name: config-file
configMap:
name: prometheus-config-file
- name: storage
persistentVolumeClaim:
claimName: prometheus-storage-claim
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-storage
namespace: monitoring
labels:
app: prometheus
spec:
capacity:
storage: 12Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
PVC Yaml data:
[vidya@KBS-VM-01 7-1_prometheus]$ cat prometheus/prom-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-storage-claim
namespace: monitoring
labels:
app: prometheus
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Do you know what is the issue and how to fix it. Please also let me know any more files need to be share,
My Guess is something problem with storage configs, seeing at events logs
Warning FailedScheduling 76s (x3 over 78s) default-scheduler running "VolumeBinding" filter plugin for pod "prometheus-75dd748df4-wrwlr": pod has unbound immediate PersistentVolumeClaims
I am using local storage.
[vidya@KBS-VM-01 7-1_prometheus]$ kubectl describe pvc prometheus-storage-claim -n monitoring
Name: prometheus-storage-claim
Namespace: monitoring
StorageClass:
Status: Bound
Volume: prometheus-storage
Labels: app=prometheus
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 12Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: prometheus-75dd748df4-wrwlr
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 37m persistentvolume-controller no persistent volumes available for this claim and no storage class is set
[vidya@KBS-VM-01 7-1_prometheus]$ kubectl logs prometheus-75dd748df4-zlncv -n monitoring
level=info ts=2020-04-28T07:49:07.885529914Z caller=main.go:238 msg="Starting Prometheus" version="(version=2.4.3, branch=HEAD, revision=167a4b4e73a8eca8df648d2d2043e21bdb9a7449)"
level=info ts=2020-04-28T07:49:07.885635014Z caller=main.go:239 build_context="(go=go1.11.1, user=root@1e42b46043e9, date=20181004-08:42:02)"
level=info ts=2020-04-28T07:49:07.885812014Z caller=main.go:240 host_details="(Linux 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 prometheus-75dd748df4-zlncv (none))"
level=info ts=2020-04-28T07:49:07.885833214Z caller=main.go:241 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-04-28T07:49:07.885849614Z caller=main.go:242 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-04-28T07:49:07.888695413Z caller=main.go:554 msg="Starting TSDB ..."
level=info ts=2020-04-28T07:49:07.889017612Z caller=main.go:423 msg="Stopping scrape discovery manager..."
level=info ts=2020-04-28T07:49:07.889033512Z caller=main.go:437 msg="Stopping notify discovery manager..."
level=info ts=2020-04-28T07:49:07.889041112Z caller=main.go:459 msg="Stopping scrape manager..."
level=info ts=2020-04-28T07:49:07.889048812Z caller=main.go:433 msg="Notify discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889071612Z caller=main.go:419 msg="Scrape discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889083112Z caller=main.go:453 msg="Scrape manager stopped"
level=info ts=2020-04-28T07:49:07.889098012Z caller=manager.go:638 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-04-28T07:49:07.889109912Z caller=manager.go:644 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-04-28T07:49:07.889124912Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..."
level=info ts=2020-04-28T07:49:07.889137812Z caller=main.go:608 msg="Notifier manager stopped"
level=info ts=2020-04-28T07:49:07.889169012Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=error ts=2020-04-28T07:49:07.889653412Z caller=main.go:617 err="opening storage failed: lock DB directory: open /data/lock: permission denied"
The problem here is pvc is not bound to the pv primarily because there is no storage class to link the pv with pvc and the capacity in pv(12Gi) and requests in pvc(10Gi) is not matching. So at the end kubernetes could not figure out which pv the pvc should be bound to.
storageClassName: manual
in spec of both PV and PVC.PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-storage
namespace: monitoring
labels:
app: prometheus
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-storage-claim
namespace: monitoring
labels:
app: prometheus
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Update:
Running the pod as root by adding runAsUser: 0
should solve the open /data/lock: permission denied
error