Kubernetes AKS Persistent Volume Disk Claims To Multiple Nodes

7/13/2018

How can I attach 100GB Persistent Volume Disk to Each Node in the AKS Kubernetes Cluster?

We are using Kubernetes on Azure using AKS.

We have a scenario where we need to attach Persistent Volumes to each Node in our AKS Cluster. We run 1 Docker Container on each Node in the Cluster.

The reason to attach volumes Dynamically is to increase the IOPS available and available amount of Storage that each Docker container needs to do its job.

The program running inside of each Docker container works against very large input data files (10GB) and writes out even larger output files(50GB).

We could mount Azure File Shares, but Azure FileShares is limited to 60MB/ps which is too slow for us to move around this much raw data. Once the program running in the Docker image has completed, it will move the output file (50GB) to Blob Storage. The total of all output files may exceed 1TB from all the containers.

I was thinking that if we can attach a Persistent Volume to each Node we can increase our available disk space as well as the IOPS without having to go to a high vCPU/RAM VM configuration (ie. DS14_v2). Our program is more I/O intensive vs CPU.

All the Docker images running in the Pod are exactly the same where they read a message from a Queue that tells it a specific input file to work against.

I've followed the docs to create a StorageClass, Persistent Volume Claims and Persistent Volume and run this against 1 POD. https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynamic-pv

However, when I create a Deployment and Scale the number of Pods from 1 to 2 I receive the error (in production we'd scale to as many nodes as necessary ~100)

Multi-Attach error for volume "pvc-784496e4-869d-11e8-8984-0a58ac1f1e06" Volume is already used by pod(s) pv-deployment-67fd8b7b95-fjn2n

I realize that an Azure Disk can only be attached to a SingleNode (ReadWriteOnce) however I'm not sure how to create multiple disks and attach them to each Node at the time we load up the Kubernetes Cluster and begin our work.

Persistent Volume Claim:

apiVersion: v1
kind: PersistentVolumeClaim
    metadata:
    name: azure-managed-disk
spec:
    accessModes:
    - ReadWriteOnce
    storageClassName: managed-premium
    resources:
    requests:
    storage: 100Gi

This is my Deployment:

apiVersion: apps/v1
kind: Deployment
    metadata:
    name: pv-deployment
    labels:
    app: nginx
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: nginx
    template:
    metadata:
    labels:
        app: nginx
    spec:
      containers:
        - name: myfrontend
        image: nginx
        volumeMounts:
        - name: volume
        mountPath: /mnt/azure
        resources: 
          limits:
            cpu: ".7"
            memory: "2.5G"
          requests:
            cpu: ".7"
            memory: "2.5G"      
         volumes:
         - name: volume
         persistentVolumeClaim:
          claimName: azure-managed-disk

If I knew that I was going to scale to 100 Nodes, would I have to create a .yaml files with 100 Deployments and be explicit for each Deployment to use a specific Volume Claim?

For example in my volume claim I'd have azure-claim-01, azure-claim-02, etc. and in each Deployment I would have to make claim to each named Volume Claim

volumes:
    - name: volume
      persistentVolumeClaim:
        claimName: azure-claim-01

I can't quite get my head around how I can do all this dynamically?

Can you recommend a better way to achieve the desired result?

-- Chris Langston
azure
kubernetes
persistent-volume-claims
persistent-volumes

2 Answers

5/14/2019

You should use the StatefulSetand volumeClaimTemplates configuration like following:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "nginx"
  replicas: 4
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: k8s.gcr.io/nginx-slim:0.8
          ports:
           - containerPort: 80
          volumeMounts:
            - name: persistent-storage
              mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: hdd
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 2Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: hdd
provisioner: kubernetes.io/azure-disk
parameters:
  skuname: Standard_LRS
  kind: managed
  cachingMode: ReadOnly

You will get Persistent Volume for every Node:

kubectl get pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                              STORAGECLASS   REASON
  AGE
pvc-0e651011-7647-11e9-bbf5-c6ab19063099   2Gi        RWO            Delete           Bound     default/persistent-storage-web-0   hdd
  51m
pvc-17181607-7648-11e9-bbf5-c6ab19063099   2Gi        RWO            Delete           Bound     default/persistent-storage-web-1   hdd
  49m
pvc-4d488893-7648-11e9-bbf5-c6ab19063099   2Gi        RWO            Delete           Bound     default/persistent-storage-web-2   hdd
  48m
pvc-6aff2a4d-7648-11e9-bbf5-c6ab19063099   2Gi        RWO            Delete           Bound     default/persistent-storage-web-3   hdd
  47m

And every Node will create dedicated Persistent Volume Claim:

kubectl get pvc

NAME                       STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistent-storage-web-0   Bound     pvc-0e651011-7647-11e9-bbf5-c6ab19063099   2Gi        RWO            hdd            55m
persistent-storage-web-1   Bound     pvc-17181607-7648-11e9-bbf5-c6ab19063099   2Gi        RWO            hdd            48m
persistent-storage-web-2   Bound     pvc-4d488893-7648-11e9-bbf5-c6ab19063099   2Gi        RWO            hdd            46m
persistent-storage-web-3   Bound     pvc-6aff2a4d-7648-11e9-bbf5-c6ab19063099   2Gi        RWO            hdd            45m
-- lmtx
Source: StackOverflow

7/14/2018

I would consider using DaemonSet. This would allow your pods to only run on each node, hence ReadWriteOnce will take effect. The constraint will be, you cannot scale your application more than the number of nodes you have.

-- Bal Chua
Source: StackOverflow