Cannot deploy mongodb StatefulSet with volumes for replicas grater than one

8/16/2019

Context

I'm sharing /data/db directory, which is mounted as a Network File System volume among all pods controlled by StatefulSet.

Problem

When I set replicas: 1 stateful set correctly deploys mongodb. Problem starts when I scale up (nr. of replicas grater than one e.g. replicas: 2) All consecutive pods have CrashLoopBackOff status.

Question

I understand error message -check debug section below. But, I don't get it. Basically, what I try to achieve is stateful deployment of mongodb, so even after pods are deleted they will persist data. Somehow, mongo stops me from doing that because Another mongod instance is already running on the /data/db director. My questions are: What am I doing wrong? How can I deploy mongodb so it's stateful and persist data, while scaling up stateful set?

Debug

Cluster state

$ kubectl get svc,sts,po,pv,pvc --output=wide
NAME            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE   SELECTOR
service/mongo   ClusterIP   None         <none>        27017/TCP   10h   run=mongo

NAME                     READY   AGE     CONTAINERS   IMAGES
statefulset.apps/mongo   1/2     8m50s   mongo        mongo:4.2.0-bionic

NAME          READY   STATUS             RESTARTS   AGE     IP          NODE        NOMINATED NODE   READINESS GATES
pod/mongo-0   1/1     Running            0          8m50s   10.44.0.2   web01       <none>           <none>
pod/mongo-1   0/1     CrashLoopBackOff   6          8m48s   10.36.0.3   compute01   <none>           <none>

NAME                                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS   REASON   AGE   VOLUMEMODE
persistentvolume/phenex-nfs-mongo   1Gi        RWX            Retain           Bound    phenex-nfs-mongo                           22m   Filesystem

NAME                                     STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS   AGE   VOLUMEMODE
persistentvolumeclaim/phenex-nfs-mongo   Bound    phenex-nfs-mongo   1Gi        RWX                           22m   Filesystem

Log

$ kubectl logs -f mongo-1
2019-08-14T23:52:30.632+0000 I  CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongo-1
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] db version v4.2.0
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] git version: a4b751dcf51dd249c5865812b390cfd1c0129c30
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.1.1  11 Sep 2018
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] allocator: tcmalloc
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] modules: none
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] build environment:
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten]     distmod: ubuntu1804
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten]     distarch: x86_64
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten]     target_arch: x86_64
2019-08-14T23:52:30.635+0000 I  CONTROL  [initandlisten] options: { net: { bindIp: "0.0.0.0" }, replication: { replSet: "rs0" } }
2019-08-14T23:52:30.642+0000 I  STORAGE  [initandlisten] exception in initAndListen: DBPathInUse: Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /data/db directory, terminating
2019-08-14T23:52:30.643+0000 I  NETWORK  [initandlisten] shutdown: going to close listening sockets...
2019-08-14T23:52:30.643+0000 I  NETWORK  [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2019-08-14T23:52:30.643+0000 I  -        [initandlisten] Stopping further Flow Control ticket acquisitions.
2019-08-14T23:52:30.643+0000 I  CONTROL  [initandlisten] now exiting
2019-08-14T23:52:30.643+0000 I  CONTROL  [initandlisten] shutting down with code:100

Error

Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). 
Another mongod instance is already running on the /data/db directory, terminating

YAML files

# StatefulSet
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: mongo
spec:
  serviceName: mongo
  replicas: 2
  selector:
    matchLabels:
      run: mongo
      tier: backend
  template:
    metadata:
      labels:
        run: mongo
        tier: backend
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: mongo
          image: mongo:4.2.0-bionic
          command:
            - mongod
          args:
            - "--replSet=rs0"
            - "--bind_ip=0.0.0.0"
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: phenex-nfs-mongo
              mountPath: /data/db
      volumes:
      - name: phenex-nfs-mongo
        persistentVolumeClaim:
          claimName: phenex-nfs-mongo

# PersistentVolume
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: phenex-nfs-mongo
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1Gi
  nfs:
    server: master
    path: /nfs/data/phenex/production/permastore/mongo
  claimRef:
    name: phenex-nfs-mongo
  persistentVolumeReclaimPolicy: Retain

# PersistentVolumeClaim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: phenex-nfs-mongo
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi
-- Lukasz Dynowski
kubernetes
mongodb

1 Answer

8/16/2019

Problem:

You are deploying more than one pod using the same pvc and pv.

Solution:

Use volumeClaimTemplates, example

Example:

# StatefulSet
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: mongo
spec:
  serviceName: mongo
  replicas: 2
  selector:
    matchLabels:
      run: mongo
      tier: backend
  template:
    metadata:
      labels:
        run: mongo
        tier: backend
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: mongo
          image: mongo:4.2.0-bionic
          command:
            - mongod
          args:
            - "--replSet=rs0"
            - "--bind_ip=0.0.0.0"
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: phenex-nfs-mongo
              mountPath: /data/db
  volumeClaimTemplates:
  - metadata:
      name: phenex-nfs-mongo
    spec:
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 100Mi
-- FL3SH
Source: StackOverflow