Unable to mount read-only Kubernetes persistent volume across n deployment replicas

11/1/2016

I have created a Kubernetes read-only many persistent volume from a gcePersistentDisk like so:

apiVersion: v1
kind: PersistentVolume
metadata:
    name: ferret-pv-1
spec:
    capacity:
    storage: 500Gi
    accessModes:
      - ReadOnlyMany
    persistentVolumeReclaimPolicy: Retain
    gcePersistentDisk:
      pdName: data-1
      partition: 1
      fsType: ext4

It creates the persistent volume from the existing gcePersistentDisk partition which already has an ext4 filesystem on it:

$ kubectl get pv
NAME          CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                    REASON    AGE
ferret-pv-1   500Gi      ROX           Retain          Bound     default/ferret-pvc             5h

I then create a Kubernetes read-only many persistent volume claim like so:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ferret-pvc
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 500Gi

It binds to the read-only PV I created above:

$ kubectl get pvc
NAME         STATUS    VOLUME        CAPACITY   ACCESSMODES   AGE
ferret-pvc   Bound     ferret-pv-1   500Gi      ROX           5h

I then create a Kubernetes deployment with 2 replicas using the PVC I just created like so:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ferret2-deployment
spec:
  replicas: 2
  template:
    metadata:
      labels:
        name: ferret2
    spec:
      containers:
      - image: us.gcr.io/centered-router-102618/ferret2
        name: ferret2
        ports:
        - name: fjds
          containerPort: 1004
          hostPort: 1004
        volumeMounts:
          - name: ferret-pd
            mountPath: /var/ferret
            readOnly: true
      volumes:
          - name: ferret-pd
            persistentVolumeClaim:
              claimName: ferret-pvc

The deployment is created:

$ kubectl get deployments
NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
ferret2-deployment   2         2         2            1           4h

However, when I look at the corresponding two pods from the deployment, only the first one came up:

$ kubectl get pods
NAME                                  READY     STATUS              RESTARTS   AGE
ferret2-deployment-1336109949-2rfqd   1/1       Running             0          4h
ferret2-deployment-1336109949-yimty   0/1       ContainerCreating   0          4h

Looking at the second pod which didn't come up:

$ kubectl describe pod ferret2-deployment-1336109949-yimty

Events:
  FirstSeen     LastSeen        Count   From                            SubObjectPath   Type        Reason      Message
  ---------     --------        -----   ----                            -------------   --------        ------      -------
  4h        1m          128     {kubelet gke-sim-cluster-default-pool-e38a7605-kgdu}            Warning     FailedMount     Unable to mount volumes for pod "ferret2-deployment-1336109949-yimty_default(d1393a2d-9fc9-11e6-a873-42010a8a009e)": timeout expired waiting for volumes to attach/mount for pod "ferret2-deployment-1336109949-yimty"/"default". list of unattached/unmounted volumes=[ferret-pd]
  4h        1m          128     {kubelet gke-sim-cluster-default-pool-e38a7605-kgdu}            Warning     FailedSync      Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "ferret2-deployment-1336109949-yimty"/"default". list of unattached/unmounted volumes=[ferret-pd]
  4h        55s         145     {controller-manager }                           Warning     FailedMount     Failed to attach volume "ferret-pv-1" on node "gke-sim-cluster-default-pool-e38a7605-kgdu" with: googleapi: Error 400: The disk resource 'data-1' is already being used by 'gke-sim-cluster-default-pool-e38a7605-fyx4'

It's refusing to start up the second pod because it thinks the first one has exclusive use of the PV. However, when I login to the first pod which claimed the PV, I see it has mounted the volume as read-only:

$ kubectl exec -ti ferret2-deployment-1336109949-2rfqd -- bash
root@ferret2-deployment-1336109949-2rfqd:/opt/ferret# mount | grep ferret
/dev/sdb1 on /var/ferret type ext4 (ro,relatime,data=ordered)

Am I missing something regarding mounting a PV read-only across multiple pods in a deployment using the same PVC? The disk is not mounted by any other containers. Since it mounted read-only on the first pod I would have expected the second and any other replicas in the deployment to have no problem claiming/mounting it. Also - how would I get ReadWriteOnce to work properly and how do I specify which pod mounts the volume rw?

-- JonB
google-compute-engine
kubernetes

2 Answers

8/12/2018

To back a volume by gcePersistentDisk the disk must be first mounted to a VM Instance where pods using this volume are running.

This is done automatically byt kubernetes, but from my experience even with this manifest:

apiVersion: v1
kind: PersistentVolume
metadata:
    name: map-service-pv
spec:
    capacity:
      storage: 25Gi
    accessModes:
      - ReadOnlyMany
    persistentVolumeReclaimPolicy: Retain
    storageClassName: ssd
    gcePersistentDisk:
      pdName: map-service-data
      readOnly: true
      fsType: ext4

It mounts it to a instance in RW mode. This prevents the disk to be mounted to any other instance. Therefore if your pods run on different nodes (instances) all but one will get the googleapi: Error 400: The disk resource xxx is already being used by....

You can check this in Google Cloud Console: Compute Engine -> Disks -> find your disk -> click on the "In use by" link which takes you to the instance. There you can see Additional disks and their modes.

The mode can changed manually in the console. The second pod then should then be able to mount.


EDIT: This solution doesn't seem to work. I've opened an issue on Kuberentes's GitHub: https://github.com/kubernetes/kubernetes/issues/67313

-- Jen
Source: StackOverflow

9/6/2018

The PV/PVC access mode is only used for binding PV/PVCs.

In your pod template, make sure that you set spec.volumes.persistentVolumeClaim.readOnly to true. This ensures the volume is attached in readonly mode.

Also in your pod template, make sure that you set spec.containers.volumeMounts[x].readOnly to true. This ensure the volume is mounted in readonly mode.

Also, since you are pre-provisioning your PVs. Make sure to set on claimRef field on your PV, to make sure no other PVC accidentally gets bound to it. See https://stackoverflow.com/a/34323691

-- Saad Ali
Source: StackOverflow