I would like to deploy Airflow locally on Minikube and have a local folder mounted for DAGs handling.
Airflow is deployed like this:
helm install $AIRFLOW_NAME apache-airflow/airflow \
--values values.yml \
--set logs.persistence.enabled=true \
--namespace $AIRFLOW_NAMESPACE \
--kubeconfig ~/.kube/config
The values.yml
looks like this:
executor: KubernetesExecutor
config:
core:
dags_folder: /dags
webserver:
extraVolumes:
- name: dags
hostPath:
path: /path/dags
extraVolumeMounts:
- name: dags
mountPath: /dags
kubectl describe pods airflow-webserver --kubeconfig ~/.kube/config --namespace airflow
:
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: airflow-airflow-config
Optional: false
logs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: airflow-logs
ReadOnly: false
dags:
Type: HostPath (bare host directory volume)
Path: /path/dags/
HostPathType:
airflow-webserver-token-xtq9h:
Type: Secret (a volume populated by a Secret)
SecretName: airflow-webserver-*
Optional: false
QoS Class: BestEffort
The volume dags appears to be correctly mounted but remains empty. What could cause this behaviour ?
Edit:
kubectl describe pods airflow-scheduler-0 --kubeconfig ~/.kube/config --namespace airflow
Mounts:
/opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
/opt/airflow/dags from dags (rw)
/opt/airflow/logs from logs (rw)
/opt/airflow/pod_templates/pod_template_file.yaml from config (ro,path="pod_template_file.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from airflow-scheduler-token-9zfpv (ro)
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: airflow-airflow-config
Optional: false
dags:
Type: HostPath (bare host directory volume)
Path: /path/dags
HostPathType:
logs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: airflow-logs
ReadOnly: false
airflow-scheduler-token-9zfpv:
Type: Secret (a volume populated by a Secret)
SecretName: airflow-scheduler-token-9zfpv
Optional: false
The reason why your mount folder is empty is because I guess you are using Docker Desktop on MaCOS (that's my wild guess)
Your dags have to be present in the /path/dags folder on the host however in case of Docker Desktop on MacOS (similarly on Windows) your host is a virtual machine not the MacOS/Windows.
In this case you need to make sure that /path/dags from your host is also mapped to the virtual machine (by default only those folders are mapped:
Via: https://docs.docker.com/desktop/mac/
By default the /Users, /Volume, /private, /tmp and /var/folders directory are shared
You can add new folders from your host to be shared as well (See "File sharing" chapter in the document above)
I was completely mistaking the hostPath
parameter for my local machine.
hostPath
refers to the Minikube node running the pod.
extraVolumes:
- name: dags
hostPath:
path: /mnt/airflow/dags
type: Directory
extraVolumeMounts:
- name: dags
mountPath: /opt/airflow/dags
This will mount a volume between the Minikube host node and the port.
The path /mnt/airflow/dags
must not be present on the local machine.
The local DAGs folder can then be mounted into the Minikube node:
minikube mount ./dags/:/mnt/airflow/dags
See: https://medium.com/@ipeluffo/running-apache-airflow-locally-on-kubernetes-minikube-31f308e3247a
Assuming that you have some dags in /path/dags already, you should mount your dags folder to scheduler not to webserver (if you are using Airflow 2). Scheduler is the one to parse dags, webserver only displays them based on information stored in the DB so it does not actually need DAGs (it used to need it Airflow 1.10 without serialization)
Also I guess you should use LocalExecutor not KubernetesExecutor if you want to execute dags from local folder - then the dags
mounted to scheduler will be available to the processes which are spawned from scheduler in the same container.
If you want to run Kubernetes Executor and want to mount host folder, I believe you will need to add it as a mount to pod template file of yours (you can generate such pod template file using airflow CLI
See https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html#pod-template-file