How can I attach 100GB Persistent Volume Disk to Each Node in the AKS Kubernetes Cluster?
We are using Kubernetes on Azure using AKS.
We have a scenario where we need to attach Persistent Volumes to each Node in our AKS Cluster. We run 1 Docker Container on each Node in the Cluster.
The reason to attach volumes Dynamically is to increase the IOPS available and available amount of Storage that each Docker container needs to do its job.
The program running inside of each Docker container works against very large input data files (10GB) and writes out even larger output files(50GB).
We could mount Azure File Shares, but Azure FileShares is limited to 60MB/ps which is too slow for us to move around this much raw data. Once the program running in the Docker image has completed, it will move the output file (50GB) to Blob Storage. The total of all output files may exceed 1TB from all the containers.
I was thinking that if we can attach a Persistent Volume to each Node we can increase our available disk space as well as the IOPS without having to go to a high vCPU/RAM VM configuration (ie. DS14_v2). Our program is more I/O intensive vs CPU.
All the Docker images running in the Pod are exactly the same where they read a message from a Queue that tells it a specific input file to work against.
I've followed the docs to create a StorageClass, Persistent Volume Claims and Persistent Volume and run this against 1 POD. https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynamic-pv
However, when I create a Deployment and Scale the number of Pods from 1 to 2 I receive the error (in production we'd scale to as many nodes as necessary ~100)
Multi-Attach error for volume "pvc-784496e4-869d-11e8-8984-0a58ac1f1e06" Volume is already used by pod(s) pv-deployment-67fd8b7b95-fjn2n
I realize that an Azure Disk can only be attached to a SingleNode (ReadWriteOnce) however I'm not sure how to create multiple disks and attach them to each Node at the time we load up the Kubernetes Cluster and begin our work.
Persistent Volume Claim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azure-managed-disk
spec:
accessModes:
- ReadWriteOnce
storageClassName: managed-premium
resources:
requests:
storage: 100Gi
This is my Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: pv-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- name: volume
mountPath: /mnt/azure
resources:
limits:
cpu: ".7"
memory: "2.5G"
requests:
cpu: ".7"
memory: "2.5G"
volumes:
- name: volume
persistentVolumeClaim:
claimName: azure-managed-disk
If I knew that I was going to scale to 100 Nodes, would I have to create a .yaml files with 100 Deployments and be explicit for each Deployment to use a specific Volume Claim?
For example in my volume claim I'd have azure-claim-01, azure-claim-02, etc. and in each Deployment I would have to make claim to each named Volume Claim
volumes:
- name: volume
persistentVolumeClaim:
claimName: azure-claim-01
I can't quite get my head around how I can do all this dynamically?
Can you recommend a better way to achieve the desired result?
You should use the StatefulSet
and volumeClaimTemplates
configuration like following:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 80
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
replicas: 4
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
volumeMounts:
- name: persistent-storage
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: persistent-storage
annotations:
volume.beta.kubernetes.io/storage-class: hdd
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: hdd
provisioner: kubernetes.io/azure-disk
parameters:
skuname: Standard_LRS
kind: managed
cachingMode: ReadOnly
You will get Persistent Volume for every Node:
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON
AGE
pvc-0e651011-7647-11e9-bbf5-c6ab19063099 2Gi RWO Delete Bound default/persistent-storage-web-0 hdd
51m
pvc-17181607-7648-11e9-bbf5-c6ab19063099 2Gi RWO Delete Bound default/persistent-storage-web-1 hdd
49m
pvc-4d488893-7648-11e9-bbf5-c6ab19063099 2Gi RWO Delete Bound default/persistent-storage-web-2 hdd
48m
pvc-6aff2a4d-7648-11e9-bbf5-c6ab19063099 2Gi RWO Delete Bound default/persistent-storage-web-3 hdd
47m
And every Node will create dedicated Persistent Volume Claim:
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistent-storage-web-0 Bound pvc-0e651011-7647-11e9-bbf5-c6ab19063099 2Gi RWO hdd 55m
persistent-storage-web-1 Bound pvc-17181607-7648-11e9-bbf5-c6ab19063099 2Gi RWO hdd 48m
persistent-storage-web-2 Bound pvc-4d488893-7648-11e9-bbf5-c6ab19063099 2Gi RWO hdd 46m
persistent-storage-web-3 Bound pvc-6aff2a4d-7648-11e9-bbf5-c6ab19063099 2Gi RWO hdd 45m
I would consider using DaemonSet. This would allow your pods to only run on each node, hence ReadWriteOnce will take effect. The constraint will be, you cannot scale your application more than the number of nodes you have.