Mounting an SMB or NFT Azure File share onto JupyterHub on kubernetes for a shared directory

7/15/2021

Cluster information:

Kubernetes version: 1.19.11

Cloud being used: Azure

Installation method: Manual creation in Azure online UI/Azure CLI

Host OS: Linux

CNI and version: Azure container networking interface, most recent

Hey everyone! I'm a relatively new user of Kubernetes, but I think I've got the basics down. I'm mainly trying to understand a more complex file share feature.

I’m essentially trying to use JupyterHub on Kubernetes for a shared development environment for a team of about a dozen users (we may expand this to larger/other teams later, but for now I want to get this working for just our team), and one feature that would be extremely helpful, and looks doable, is having a shared directory for notebooks, files, and data. I think I’m pretty close to getting this set-up, but I’m running into a mounting issue that I can’t quite resolve. I’ll quickly explain my setup first and then the issue. I’d really appreciate any help/comments/hints that anyone has!

Setup

Currently, all of this setup is on a Kubernetes cluster in Azure or other Azure-hosted services. We have a resource group with a kubernetes cluster, App Service Domain, DNS Zone, virtual network, container registry (for our custom docker images), and storage account. Everything works fine, except that in the storage account, I have an Azure NFS (and plain SMB if needed) file share that I’ve tried mounting via a PV and PVC to a JupyterHub server, but to no avail.

To create the PV, I set up an NFS file share in Azure and created the appropriate kubernetes secret as follows:

 # Get storage account key
STORAGE_KEY=$(az storage account keys list --resource-group $resourceGroupName --account-name $storageAccountName --query "[0].value" -o tsv)

kubectl create secret generic azure-secret \ 
    --from-literal=azurestorageaccountname=$storageAccountName \ 
    --from-literal=azurestorageaccountkey=$STORAGE_KEY

I then tried to create the PV with this YAML file:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: shared-nfs-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  azureFile:
    secretName: azure-secret
    shareName: aksshare
    readOnly: false
  nfs:
    server: wintermutessd.file.core.windows.net:/wintermutessd/wintermutessdshare
    path: /home/shared
    readOnly: false
  storageClassName: premium-nfs
  mountOptions: 
  - dir_mode=0777
  - file_mode=0777
  - uid=1000
  - gid=1000
  - mfsymlinks
  - nobrl

Issue

During the creation of the PV, I get the error Failed to create the persistentvolume 'shared-nfs-pv'. Error: Invalid (422) : PersistentVolume "shared-nfs-pv" is invalid: spec.azureFile: Forbidden: may not specify more than 1 volume type. Removing the azureFile options solves this error, but I feel like it would be necessary to specify the kubernetes secret that I created. If I do remove the azureFile options, it does successfully create and bind the PV. Then I created the corresponding PVC with

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-nfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  # Match name of PV
  volumeName: shared-nfs-pv
  storageClassName: premium-nfs
  resources:
    requests:
      storage: 50Gi

which also successfully bound. However, when I add the configuration to my Helm config for JupyterHub with

singleuser:
  storage:
    extraVolumes:
      - name: azure
        persistentVolumeClaim:
          claimName: azurefile
    extraVolumeMounts:
      - name: azure
        mountPath: /home/shared

I get the following error when the jupyterhub server tries to spawn and mount the PVC:

Error message from jupyterhub

Just in case this is relevant, the NFS azure file share is only accessible via a private endpoint, but this should be fine since my kubernetes cluster is running in the same virtual network. In fact, Azure tells me that I could just mount this NFS share on linux with

sudo apt-get -y update
sudo apt-get install nfs-common
sudo mkdir -p /mount/wintermutessd/wintermutessdshare
sudo mount -t nfs wintermutessd.file.core.windows.net:/wintermutessd/wintermutessdshare /mount/wintermutessd/wintermutessdshare -o vers=4,minorversion=1,sec=sys

But when I add this to my Dockerfile for the docker image that I'm using in my container, the build fails and tells me that systemctl isn't installed. Trying to add this through apt-get install systemd doesn't resolve the issue either.

From looking at other K8s discourse posts, I found this one ( File based data exchange between pods and daemon-set - General Discussions - Discuss Kubernetes) which looked helpful and has a useful link to deploying an NSF server, but I think the fact that my NFS server is an Azure file share makes this a slightly different scenario.

If anyone has any ideas or suggestions, I'd really appreciate it!

P.S. I had previously posted on the JupyterHub discourse here ( Mounting an SMB or NFT Azure File share onto JupyterHub on kubernetes for a shared directory - JupyterHub - Jupyter Community Forum), but it was suggested that my issue is more of a k8s issue rather than a JupyterHub one. I also looked at this other stackoverflow post, but, even though I am open to an SMB file share, it has to do more with VMs and not with PV/PVCs on kubernetes.

Thank you! :)

-- nbingo
azure
docker
jupyterhub
kubernetes
kubernetes-helm

1 Answer

7/16/2021

so I actually managed to figure this out using a dynamically allocated Azure file share. I'm writing an internal documentation for this, but I thought I'd post the relevant bit here. I hope this helps people!

Dynamically creating an Azure file share and storage account by defining a PVC and storage class

Here, we're mainly following the documentation for dynamically creating a PV with Azure Files in AKS. The general idea is to create a storage class that will define what kind of Azure file share we want to create (premium vs. standard and the different redundancy modes) and then create a PVC (persistent volume claim) that adheres to that storage class. Consequently, when JupyterHub tries to mount the PVC we created, it will automatically create a PV (persistent volume) for the PVC to bind to, which will then automatically create a storage account and file share for the PV to actually store filese in. This will all be done in the resource group that backs the one we're already using (these generally start with "MC_"). Here, we will be using the premium storage class with zone reduntant storage. First, create the storage class to be used (more info on the available tags here can be found in this repository) with the following YAML

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: shared-premium-azurefile
provisioner: kubernetes.io/azure-file
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - uid=0
  - gid=0
  - mfsymlinks
  - cache=strict
  - actimeo=30
parameters:
  skuName: Premium_ZRS

Name this file azure-file-sc.yaml and run

kubectl apply -f azure-file-sc.yaml

Next, we will create a PVC which will dynamically provision from our Azure file share (it automatically creates a PV for us). Create the file azure-file-pvc.yaml with the following code

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-premium-azurefile-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: shared-premium-azurefile
  resources:
    requests:
      storage: 100Gi

and apply it with

kubectl apply -f azure-file-pvc.yaml

This will create the file share and the corresponding PV. We can check that our PVC and storage class were successfully created with

kubectl get storageclass
kubectl get pvc

It might take a couple of minutes for the PVC to bind.

On the Azure side, this is all that has to be done, and the dynamic allocation of the PV and file share are taken care of for us.

Mounting the PVC to JupyterHub in the home directory

JupyterHub, by default, creates a PVC of 10Gi for each new user, but we can also tell it to mount existing PVCs as external volumes (think of this as just plugging in your computer to a shared USB drive). To mount our previously created PVC in the home folder of all of our JupyterHub users, we simply add the following to our config.py Helm config:

singleuser:
  storage:
    extraVolumes:
      - name: azure
        persistentVolumeClaim:
          claimName: shared-premium-azurefile-pvc
    extraVolumeMounts:
      - name: azure
        mountPath: /home/jovyan/shared

Now, when JupyterHub starts up, all users should have a shared directory in their home folders with read and write permission.

-- nbingo
Source: StackOverflow