I'm sharing /data/db
directory, which is mounted as a Network File System volume among all pods controlled by StatefulSet.
When I set replicas: 1
stateful set correctly deploys mongodb. Problem starts when I scale up (nr. of replicas grater than one e.g. replicas: 2
) All consecutive pods have CrashLoopBackOff
status.
I understand error message -check debug section below. But, I don't get it. Basically, what I try to achieve is stateful deployment of mongodb, so even after pods are deleted they will persist data. Somehow, mongo stops me from doing that because Another mongod instance is already running on the /data/db director
. My questions are: What am I doing wrong? How can I deploy mongodb so it's stateful and persist data, while scaling up stateful set?
Cluster state
$ kubectl get svc,sts,po,pv,pvc --output=wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/mongo ClusterIP None <none> 27017/TCP 10h run=mongo
NAME READY AGE CONTAINERS IMAGES
statefulset.apps/mongo 1/2 8m50s mongo mongo:4.2.0-bionic
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/mongo-0 1/1 Running 0 8m50s 10.44.0.2 web01 <none> <none>
pod/mongo-1 0/1 CrashLoopBackOff 6 8m48s 10.36.0.3 compute01 <none> <none>
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE VOLUMEMODE
persistentvolume/phenex-nfs-mongo 1Gi RWX Retain Bound phenex-nfs-mongo 22m Filesystem
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE
persistentvolumeclaim/phenex-nfs-mongo Bound phenex-nfs-mongo 1Gi RWX 22m Filesystem
Log
$ kubectl logs -f mongo-1
2019-08-14T23:52:30.632+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongo-1
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] db version v4.2.0
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] git version: a4b751dcf51dd249c5865812b390cfd1c0129c30
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] allocator: tcmalloc
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] modules: none
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] build environment:
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] distmod: ubuntu1804
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] distarch: x86_64
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] target_arch: x86_64
2019-08-14T23:52:30.635+0000 I CONTROL [initandlisten] options: { net: { bindIp: "0.0.0.0" }, replication: { replSet: "rs0" } }
2019-08-14T23:52:30.642+0000 I STORAGE [initandlisten] exception in initAndListen: DBPathInUse: Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /data/db directory, terminating
2019-08-14T23:52:30.643+0000 I NETWORK [initandlisten] shutdown: going to close listening sockets...
2019-08-14T23:52:30.643+0000 I NETWORK [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2019-08-14T23:52:30.643+0000 I - [initandlisten] Stopping further Flow Control ticket acquisitions.
2019-08-14T23:52:30.643+0000 I CONTROL [initandlisten] now exiting
2019-08-14T23:52:30.643+0000 I CONTROL [initandlisten] shutting down with code:100
Error
Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable).
Another mongod instance is already running on the /data/db directory, terminating
YAML files
# StatefulSet
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mongo
spec:
serviceName: mongo
replicas: 2
selector:
matchLabels:
run: mongo
tier: backend
template:
metadata:
labels:
run: mongo
tier: backend
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mongo
image: mongo:4.2.0-bionic
command:
- mongod
args:
- "--replSet=rs0"
- "--bind_ip=0.0.0.0"
ports:
- containerPort: 27017
volumeMounts:
- name: phenex-nfs-mongo
mountPath: /data/db
volumes:
- name: phenex-nfs-mongo
persistentVolumeClaim:
claimName: phenex-nfs-mongo
# PersistentVolume
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: phenex-nfs-mongo
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 1Gi
nfs:
server: master
path: /nfs/data/phenex/production/permastore/mongo
claimRef:
name: phenex-nfs-mongo
persistentVolumeReclaimPolicy: Retain
# PersistentVolumeClaim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: phenex-nfs-mongo
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
You are deploying more than one pod using the same pvc and pv.
Use volumeClaimTemplates
, example
# StatefulSet
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mongo
spec:
serviceName: mongo
replicas: 2
selector:
matchLabels:
run: mongo
tier: backend
template:
metadata:
labels:
run: mongo
tier: backend
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mongo
image: mongo:4.2.0-bionic
command:
- mongod
args:
- "--replSet=rs0"
- "--bind_ip=0.0.0.0"
ports:
- containerPort: 27017
volumeMounts:
- name: phenex-nfs-mongo
mountPath: /data/db
volumeClaimTemplates:
- metadata:
name: phenex-nfs-mongo
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi