Kubernetes: how to change accessModes of auto scaled pod to ReadOnlyMany?

6/8/2017

I'm trying HPA: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: api-orientdb-pv
  labels:
    app: api-orientdb
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  gcePersistentDisk:
    pdName: api-orientdb-{{ .Values.cluster.name | default "testing" }}
    fsType: ext4

PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: api-orientdb-pv-claim
  labels:
    app: api
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  selector:
    matchLabels:
      app: api-orientdb
  storageClassName: ""

HPA:

Name:                           api-orientdb-deployment
Namespace:                      default
Labels:                         <none>
Annotations:                        <none>
CreationTimestamp:                  Thu, 08 Jun 2017 10:37:06 +0700
Reference:                      Deployment/api-orientdb-deployment
Metrics:                        ( current / target )
  resource cpu on pods  (as a percentage of request):   17% (8m) / 10%
Min replicas:                       1
Max replicas:                       2
Events:                         <none>

and new pod has been created:

NAME                                       READY     STATUS    RESTARTS   AGE
api-orientdb-deployment-2506639415-n8nbt   1/1       Running   0          7h
api-orientdb-deployment-2506639415-x8nvm   1/1       Running   0          6h

As you can see, I'm using gcePersistentDisk which does not support ReadWriteMany access mode.

Newly created pod also mount the volume as rw mode:

Name:        api-orientdb-deployment-2506639415-x8nvm
Containers:
    Mounts:
      /orientdb/databases from api-orientdb-persistent-storage (rw)
Volumes:
  api-orientdb-persistent-storage:
    Type:   PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  api-orientdb-pv-claim
    ReadOnly:   false

Question: How does it work in this case? Is there a way to config the mainly pod (n8nbt) to use a PV with ReadWriteOnce access mode, and all other scaled pod (x8nvm) should be ReadOnlyMany? How to do it automatically?

The only way I can think of is create another PVC mount the same disk but with different accessModes, but then the question becomes to: how to config the newly scaled pod to use that PVC?


Fri Jun 9 11:29:34 ICT 2017

I found something: there is nothing ensure that the newly scaled pod will be run on the same node as the first pod. So, if the volume plugin does not support ReadWriteMany and the scaled pod is run on another node, it will failed to mount:

Failed to attach volume "api-orientdb-pv" on node "gke-testing-default-pool-7711f782-4p6f" with: googleapi: Error 400: The disk resource 'projects/xx/zones/us-central1-a/disks/api-orientdb-testing' is already being used by 'projects/xx/zones/us-central1-a/instances/gke-testing-default-pool-7711f782-h7xv'

https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

Important! A volume can only be mounted using one access mode at a time, even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.

If so, the only way to ensure that the HPA works is ReadWriteMany access mode must be supported by the volume plugin?


Fri Jun 9 14:28:30 ICT 2017

If you want only one Pod to be able to write then create two Deployments. One with replicas: 1 and the other one that has the autoscaler attached (and has readOnly: true in it)

OK.

Do note that a GCE PD can only be mounted by a single node if any of the Pods are accessing it readWrite.

Then I have to use label selectors to ensure that all pods end up on the same node, right?

Your question is not clear to me

Let me explain: in case of autoscaling, assuming that by using label selectors, I can ensure that newly scaled pod ends up on the same node, but since volume is mounted as rw, does it break the GCE PD as we have 2 pods mount a volume as rw?

First of all, generally, if you have a Deployment with replicas: 1 you won't have 2 Pod running at the same time (most of the time!!)

I know.

On the other hand if a PVC specifies ReadWriteOnce then after the first Pod is scheduled any other Pods will need to be scheduled on the same node or not be scheduled at all (most common case: there aren't enough resources on the Node)

In case of HPA, it does not. Please see my above updates for more details.

If for any reason you do have 2 Pods accessing the same mount readWrite then it's completely up the the application what will happen and is not kubernetes specific

The main thing made me confused is:

ReadWriteOnce – the volume can be mounted as read-write by a single node

OK, node, not pod. But in case of autoscaling, if 2 pods are running on the same node, and both mount the volume as rw, does GCE PD support it? If so, how does it work?

-- quanta
autoscaling
google-cloud-platform
google-kubernetes-engine
kubernetes

2 Answers

10/17/2018

I think we can use StatefulSet in order to each replica has its own PV.

https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes#deployments_vs_statefulsets

Even Deployments with one replica using a ReadWriteOnce Volume are not recommended. This is because the default Deployment strategy will create a second Pod before bringing down the first pod on a recreate. The Deployment may fail in deadlock as the second Pod can't start because the ReadWriteOnce Volume is already in use, and the first Pod wont be removed because the second Pod has not yet started. Instead, use a StatefulSet with ReadWriteOnce volumes.

StatefulSets are the recommended method of deploying stateful applications that require a unique volume per replica. By using StatefulSets with Persistent Volume Claim Templates you can have applications that can scale up automatically with unique Persistent Volume Claims associated to each replica Pod.

-- lx8
Source: StackOverflow

6/8/2017

It's working as intended. The Once in ReadWriteOnce refers to the number of Nodes that can use the PVC and not the number of Pods (HPA or no HPA).

If you want only one Pod to be able to write then create two Deployments. One with replicas: 1 and the other one that has the autoscaler attached (and has readOnly: true in it). Do note that a GCE PD can only be mounted by a single node if any of the Pods are accessing it readWrite.

-- Janos Lenart
Source: StackOverflow