In our cluster we have a horizontally-scaling deployment of an application that uses a lot of local disk space, which has been causing major cluster stability problems (docker crashes, nodes recreate, etc).
We are trying to have each pod provision a gcePersistentDisk
of its own so its disk usage is isolated from the cluster. We created a storage class and a persistent volume claim that uses that class, and have specified a volume mount for that claim in our deployment's pod template spec.
However, when we set the autoscaler to use multiple replicas, they apparently try to use the same volume, and we get this error:
Multi-Attach error for volume
Volume is already exclusively attached to one node and can't be attached to another
Here are the relevant parts of our manifests. Storage Class:
{
"apiVersion": "storage.k8s.io/v1",
"kind": "StorageClass",
"metadata": {
"annotations": {},
"name": "some-storage",
"namespace": ""
},
"parameters": {
"type": "pd-standard"
},
"provisioner": "kubernetes.io/gce-pd"
}
PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: some-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: some-class
Deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: some-deployment
spec:
volumes:
- name: some-storage
persistentVolumeClaim:
claimName: some-pvc
containers:
[omitted]
volumeMounts:
- name: some-storage
mountPath: /var/path
With those applied, we update the deployment's autoscaler to a minimum of 2 replicas and get the above error.
A Deployment
is meant to be stateless. There is no way for the deployment controller to determine which disk belongs to which pod once a pod gets rescheduled, which would lead to corrupted state. That is the reason why a Deployment
can only have one disk shared across all its pods.
Concerning the error you are seeing:
Multi-Attach error for volume Volume is already exclusively attached to one node and can't be attached to another
You are getting this because you have pods across multiple nodes, but only one volume (because a Deployment
can only have one) and multiple nodes are trying to mount this volume to attach it to your deployments pods. The volume doesn't seem to be NFS which could be mounted into multiple nodes at the same time. If you do not care about state at all and still want to use a Deployment
, then you must use a disk that supports mounts from multiple nodes at the same time, like NFS. Further, you would need to change your PVCs accessModes
policy to ReadWriteMany
, as multiple pods would write to the same physical volume.
If you need a dedicated disk for each pod, then you might want to use a StatefulSet
instead. As the name suggests, its pods are meant to keep state, thus you can also define a volumeClaimTemplates
section in it, which will create a dedicated disk for each pod as described in the documentation.