Kubernetes (Minikube) volume is mounted (from Airflow Helm values) but remains empty

9/16/2021

I would like to deploy Airflow locally on Minikube and have a local folder mounted for DAGs handling.

Airflow is deployed like this:

helm install $AIRFLOW_NAME apache-airflow/airflow \
    --values values.yml \
    --set logs.persistence.enabled=true \
    --namespace $AIRFLOW_NAMESPACE \
    --kubeconfig ~/.kube/config

The values.yml looks like this:

executor: KubernetesExecutor
config:
  core:
    dags_folder: /dags
webserver:
  extraVolumes:
    - name: dags
      hostPath:
        path: /path/dags
  extraVolumeMounts:
    - name: dags
      mountPath: /dags

kubectl describe pods airflow-webserver --kubeconfig ~/.kube/config --namespace airflow:

Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      airflow-airflow-config
    Optional:  false
  logs:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  airflow-logs
    ReadOnly:   false
  dags:
    Type:          HostPath (bare host directory volume)
    Path:          /path/dags/
    HostPathType:  
  airflow-webserver-token-xtq9h:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  airflow-webserver-*
    Optional:    false
QoS Class:       BestEffort

The volume dags appears to be correctly mounted but remains empty. What could cause this behaviour ?

Edit: kubectl describe pods airflow-scheduler-0 --kubeconfig ~/.kube/config --namespace airflow

    Mounts:
      /opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
      /opt/airflow/dags from dags (rw)
      /opt/airflow/logs from logs (rw)
      /opt/airflow/pod_templates/pod_template_file.yaml from config (ro,path="pod_template_file.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from airflow-scheduler-token-9zfpv (ro)
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      airflow-airflow-config
    Optional:  false
  dags:
    Type:          HostPath (bare host directory volume)
    Path:          /path/dags
    HostPathType:  
  logs:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  airflow-logs
    ReadOnly:   false
  airflow-scheduler-token-9zfpv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  airflow-scheduler-token-9zfpv
    Optional:    false
-- val
airflow
docker-volume
kubernetes
minikube

3 Answers

9/17/2021

The reason why your mount folder is empty is because I guess you are using Docker Desktop on MaCOS (that's my wild guess)

Your dags have to be present in the /path/dags folder on the host however in case of Docker Desktop on MacOS (similarly on Windows) your host is a virtual machine not the MacOS/Windows.

In this case you need to make sure that /path/dags from your host is also mapped to the virtual machine (by default only those folders are mapped:

Via: https://docs.docker.com/desktop/mac/

By default the /Users, /Volume, /private, /tmp and /var/folders directory are shared

You can add new folders from your host to be shared as well (See "File sharing" chapter in the document above)

-- Jarek Potiuk
Source: StackOverflow

9/17/2021

I was completely mistaking the hostPath parameter for my local machine. hostPath refers to the Minikube node running the pod.

  extraVolumes:
    - name: dags
      hostPath:
        path: /mnt/airflow/dags
        type: Directory
  extraVolumeMounts:
    - name: dags
      mountPath: /opt/airflow/dags

This will mount a volume between the Minikube host node and the port. The path /mnt/airflow/dags must not be present on the local machine.

The local DAGs folder can then be mounted into the Minikube node:

minikube mount ./dags/:/mnt/airflow/dags

See: https://medium.com/@ipeluffo/running-apache-airflow-locally-on-kubernetes-minikube-31f308e3247a

-- val
Source: StackOverflow

9/16/2021

Assuming that you have some dags in /path/dags already, you should mount your dags folder to scheduler not to webserver (if you are using Airflow 2). Scheduler is the one to parse dags, webserver only displays them based on information stored in the DB so it does not actually need DAGs (it used to need it Airflow 1.10 without serialization)

Also I guess you should use LocalExecutor not KubernetesExecutor if you want to execute dags from local folder - then the dags mounted to scheduler will be available to the processes which are spawned from scheduler in the same container.

If you want to run Kubernetes Executor and want to mount host folder, I believe you will need to add it as a mount to pod template file of yours (you can generate such pod template file using airflow CLI

See https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html#pod-template-file

-- Jarek Potiuk
Source: StackOverflow