Kubernetes version: 1.19.11
Cloud being used: Azure
Installation method: Manual creation in Azure online UI/Azure CLI
Host OS: Linux
CNI and version: Azure container networking interface, most recent
Hey everyone! I'm a relatively new user of Kubernetes, but I think I've got the basics down. I'm mainly trying to understand a more complex file share feature.
I’m essentially trying to use JupyterHub on Kubernetes for a shared development environment for a team of about a dozen users (we may expand this to larger/other teams later, but for now I want to get this working for just our team), and one feature that would be extremely helpful, and looks doable, is having a shared directory for notebooks, files, and data. I think I’m pretty close to getting this set-up, but I’m running into a mounting issue that I can’t quite resolve. I’ll quickly explain my setup first and then the issue. I’d really appreciate any help/comments/hints that anyone has!
Currently, all of this setup is on a Kubernetes cluster in Azure or other Azure-hosted services. We have a resource group with a kubernetes cluster, App Service Domain, DNS Zone, virtual network, container registry (for our custom docker images), and storage account. Everything works fine, except that in the storage account, I have an Azure NFS (and plain SMB if needed) file share that I’ve tried mounting via a PV and PVC to a JupyterHub server, but to no avail.
To create the PV, I set up an NFS file share in Azure and created the appropriate kubernetes secret as follows:
# Get storage account key
STORAGE_KEY=$(az storage account keys list --resource-group $resourceGroupName --account-name $storageAccountName --query "[0].value" -o tsv)
kubectl create secret generic azure-secret \
--from-literal=azurestorageaccountname=$storageAccountName \
--from-literal=azurestorageaccountkey=$STORAGE_KEY
I then tried to create the PV with this YAML file:
apiVersion: v1
kind: PersistentVolume
metadata:
name: shared-nfs-pv
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany
azureFile:
secretName: azure-secret
shareName: aksshare
readOnly: false
nfs:
server: wintermutessd.file.core.windows.net:/wintermutessd/wintermutessdshare
path: /home/shared
readOnly: false
storageClassName: premium-nfs
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=1000
- gid=1000
- mfsymlinks
- nobrl
During the creation of the PV, I get the error Failed to create the persistentvolume 'shared-nfs-pv'. Error: Invalid (422) : PersistentVolume "shared-nfs-pv" is invalid: spec.azureFile: Forbidden: may not specify more than 1 volume type
. Removing the azureFile
options solves this error, but I feel like it would be necessary to specify the kubernetes secret that I created. If I do remove the azureFile
options, it does successfully create and bind the PV. Then I created the corresponding PVC with
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-nfs-pvc
spec:
accessModes:
- ReadWriteMany
# Match name of PV
volumeName: shared-nfs-pv
storageClassName: premium-nfs
resources:
requests:
storage: 50Gi
which also successfully bound. However, when I add the configuration to my Helm config for JupyterHub with
singleuser:
storage:
extraVolumes:
- name: azure
persistentVolumeClaim:
claimName: azurefile
extraVolumeMounts:
- name: azure
mountPath: /home/shared
I get the following error when the jupyterhub server tries to spawn and mount the PVC:
Just in case this is relevant, the NFS azure file share is only accessible via a private endpoint, but this should be fine since my kubernetes cluster is running in the same virtual network. In fact, Azure tells me that I could just mount this NFS share on linux with
sudo apt-get -y update
sudo apt-get install nfs-common
sudo mkdir -p /mount/wintermutessd/wintermutessdshare
sudo mount -t nfs wintermutessd.file.core.windows.net:/wintermutessd/wintermutessdshare /mount/wintermutessd/wintermutessdshare -o vers=4,minorversion=1,sec=sys
But when I add this to my Dockerfile for the docker image that I'm using in my container, the build fails and tells me that systemctl
isn't installed. Trying to add this through apt-get install systemd
doesn't resolve the issue either.
From looking at other K8s discourse posts, I found this one ( File based data exchange between pods and daemon-set - General Discussions - Discuss Kubernetes) which looked helpful and has a useful link to deploying an NSF server, but I think the fact that my NFS server is an Azure file share makes this a slightly different scenario.
If anyone has any ideas or suggestions, I'd really appreciate it!
P.S. I had previously posted on the JupyterHub discourse here ( Mounting an SMB or NFT Azure File share onto JupyterHub on kubernetes for a shared directory - JupyterHub - Jupyter Community Forum), but it was suggested that my issue is more of a k8s issue rather than a JupyterHub one. I also looked at this other stackoverflow post, but, even though I am open to an SMB file share, it has to do more with VMs and not with PV/PVCs on kubernetes.
Thank you! :)
so I actually managed to figure this out using a dynamically allocated Azure file share. I'm writing an internal documentation for this, but I thought I'd post the relevant bit here. I hope this helps people!
Here, we're mainly following the documentation for dynamically creating a PV with Azure Files in AKS. The general idea is to create a storage class that will define what kind of Azure file share we want to create (premium vs. standard and the different redundancy modes) and then create a PVC (persistent volume claim) that adheres to that storage class. Consequently, when JupyterHub tries to mount the PVC we created, it will automatically create a PV (persistent volume) for the PVC to bind to, which will then automatically create a storage account and file share for the PV to actually store filese in. This will all be done in the resource group that backs the one we're already using (these generally start with "MC_"). Here, we will be using the premium storage class with zone reduntant storage. First, create the storage class to be used (more info on the available tags here can be found in this repository) with the following YAML
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: shared-premium-azurefile
provisioner: kubernetes.io/azure-file
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=0
- gid=0
- mfsymlinks
- cache=strict
- actimeo=30
parameters:
skuName: Premium_ZRS
Name this file azure-file-sc.yaml
and run
kubectl apply -f azure-file-sc.yaml
Next, we will create a PVC which will dynamically provision from our Azure file share (it automatically creates a PV for us). Create the file azure-file-pvc.yaml
with the following code
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-premium-azurefile-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: shared-premium-azurefile
resources:
requests:
storage: 100Gi
and apply it with
kubectl apply -f azure-file-pvc.yaml
This will create the file share and the corresponding PV. We can check that our PVC and storage class were successfully created with
kubectl get storageclass
kubectl get pvc
It might take a couple of minutes for the PVC to bind.
On the Azure side, this is all that has to be done, and the dynamic allocation of the PV and file share are taken care of for us.
JupyterHub, by default, creates a PVC of 10Gi for each new user, but we can also tell it to mount existing PVCs as external volumes (think of this as just plugging in your computer to a shared USB drive). To mount our previously created PVC in the home folder of all of our JupyterHub users, we simply add the following to our config.py
Helm config:
singleuser:
storage:
extraVolumes:
- name: azure
persistentVolumeClaim:
claimName: shared-premium-azurefile-pvc
extraVolumeMounts:
- name: azure
mountPath: /home/jovyan/shared
Now, when JupyterHub starts up, all users should have a shared directory in their home folders with read and write permission.