I want to deploy Airflow on Kubernetes where pods have access to the same DAGs, in a Shared Persistent Volume. According to the documentation (https://github.com/helm/charts/tree/master/stable/airflow#using-one-volume-for-both-logs-and-dags), it seems I have to set and pass these values to Helm: extraVolume
, extraVolumeMount
, persistence.enabled
, logsPersistence.enabled
, dags.path
, logs.path
.
Any custom values I pass when installing the official Helm chart results in errors similar to:
Error: YAML parse error on airflow/templates/deployments-web.yaml: error converting YAML to JSON: yaml: line 69: could not find expected ':'
microk8s.helm install --namespace "airflow" --name "airflow" stable/airflow
microk8s.helm install --namespace "airflow" --name "airflow" stable/airflow \
--set airflow.extraVolumes=/home/*user*/github/airflowDAGs \
--set airflow.extraVolumeMounts=/home/*user*/github/airflowDAGs \
--set dags.path=/home/*user*/github/airflowDAGs/dags \
--set logs.path=/home/*user*/github/airflowDAGs/logs \
--set persistence.enabled=false \
--set logsPersistence.enabled=false
microk8s.helm install --namespace "airflow" --name "airflow" stable/airflow --values=values_pv.yaml
, with values_pv.yaml
: https://pastebin.com/PryCgKnC/home/*user*/github/airflowDAGs
to a path on your machine to replicate the error.values.yaml
:## Configure DAGs deployment and update
dags:
##
## mount path for persistent volume.
## Note that this location is referred to in airflow.cfg, so if you change it, you must update airflow.cfg accordingly.
path: /home/*user*/github/airflowDAGs/dags
How do I configure airflow.cfg
in a Kubernetes deployement? In a non-containerized deployment of Airflow, this file can be found in ~/airflow/airflow.cfg
.
airflow.cfg
refers to: https://github.com/helm/charts/blob/master/stable/airflow/templates/deployments-web.yaml#L69Which contains git
. Are the .yaml
wrongly configured, and it falsely is trying to use git pull
, but since no git path is specified, this fails?
microk8s.kubectl version
: v1.15.4microk8s.helm version
: v2.14.3How do I correctly pass the right values to the Airflow Helm chart to be able to deploy Airflow on Kubernetes with Pods having access to the same DAGs and logs on a Shared Persistent Volume?
Not sure if you have this solved yet, but if you haven't I think there is a pretty simple way close to what you are doing.
All of the Deployments, Services, Pods need the persistent volume information - where it lives locally and where it should go within each kube kind. It looks like the values.yaml for the chart provides a way to do this. I'll only show this with dags below, but I think it should be roughly the same process for logs as well.
So the basic steps are, 1) tell kube where the 'volume' (directory) lives on your computer, 2) tell kube where to put that in your containers, and 3) tell airflow where to look for the dags. So, you can copy the values.yaml file from the helm repo and alter it with the following.
airflow
sectionFirst, you need to create a volume containing the items in your local directory (this is the extraVolumes
below). Then, that needs to be mounted - luckily putting it here will template it into all kube files. Once that volume is created, then you should tell it to mount dags
. So basically, extraVolumes
creates the volume, and extraVolumeMounts
mounts the volume.
airflow:
extraVolumeMounts: # this will get the volume and mount it to that path in the container
- name: dags
mountPath: /usr/local/airflow/dags # location in the container it will put the directory mentioned below.
extraVolumes: # this will create the volume from the directory
- name: dags
hostPath:
path: "path/to/local/directory" # For you this is something like /home/*user*/github/airflowDAGs/dags
airflow:
config:
AIRFLOW__CORE__DAGS_FOLDER: "/usr/local/airflow/dags" # this needs to match the mountPath in the extraVolumeMounts section
values.yaml
file.helm install --namespace "airflow" --name "airflow" -f local/path/to/values.yaml stable/airflow
In the end, this should allow airflow to see your local directory in the dags folder. If you add a new file, it should show up in the container - though it may take a minute to show up in the UI - I don't think the dagbag process is constantly running? Anyway, hope this helps!
So if we think about using values.yaml, there is a problem because You edited it the wrong way.
extraVolumeMounts: home/*user*/github/airflowDAGs
## Additional volumeMounts to the main containers in the Scheduler, Worker and Web pods.
# - name: synchronised-dags
# mountPath: /usr/local/airflow/dags
extraVolumes: home/*user*/github/airflowDAGs
## Additional volumes for the Scheduler, Worker and Web pods.
# - name: synchronised-dags
# emptyDir: {}
You can't just pass path like that if extraVolumeMounts need name and mounthPath to work, that's the reason you have #
there, so You can just delete them,add your values and its should work.
It should look like this
extraVolumeMounts:
- name: synchronised-dags
mountPath: /usr/local/airflow/dags
extraVolumes:
- name: synchronised-dags
emptyDir: {}
That's the way You can install it:
1.Use helm fetch to download airflow chart to your pc
helm fetch stable/airflow --untar
2.Edit airflow/values.yaml extraVolumeMount and extraVolume like in example above,just add your name and path.
nano/vi/vim airflow/values.yaml
3.You can either change rest things in airflow/values.yaml and use:
helm install ./airflow --namespace "airflow" --name "airflow" -f ./airflow/values.yaml
OR
use this command with just extraVolumeMount and extraVolume edited
helm install --set dags.path=/home/user/github/airflowDAGs/dags --set logs.path=/home/user/github/airflowDAGs/logs --set persistence.enabled=false --set logsPersistence.enabled=false ./airflow --namespace "airflow" --name "airflow" -f ./airflow/values.yaml
Result:
NAME: airflow
LAST DEPLOYED: Fri Oct 11 09:18:46 2019
NAMESPACE: airflow
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
airflow-env 20 2s
airflow-git-clone 1 2s
airflow-postgresql 0 2s
airflow-redis 3 2s
airflow-redis-health 3 2s
airflow-scripts 1 2s
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
airflow-flower 0/1 1 0 1s
airflow-scheduler 0/1 1 0 1s
airflow-web 0/1 1 0 1s
==> v1/PersistentVolumeClaim
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
airflow-postgresql Pending standard 2s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
airflow-flower-5596b45d58-wrg74 0/1 ContainerCreating 0 1s
airflow-postgresql-75bf7d8774-dxxjn 0/1 Pending 0 1s
airflow-redis-master-0 0/1 ContainerCreating 0 1s
airflow-scheduler-8696d66bcf-bwm2s 0/1 ContainerCreating 0 1s
airflow-web-84797489f5-8wzsm 0/1 ContainerCreating 0 1s
airflow-worker-0 0/1 Pending 0 0s
==> v1/Secret
NAME TYPE DATA AGE
airflow-postgresql Opaque 1 2s
airflow-redis Opaque 1 2s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
airflow-flower ClusterIP 10.0.7.168 <none> 5555/TCP 1s
airflow-postgresql ClusterIP 10.0.8.62 <none> 5432/TCP 2s
airflow-redis-headless ClusterIP None <none> 6379/TCP 1s
airflow-redis-master ClusterIP 10.0.8.5 <none> 6379/TCP 1s
airflow-web ClusterIP 10.0.10.176 <none> 8080/TCP 1s
airflow-worker ClusterIP None <none> 8793/TCP 1s
==> v1/ServiceAccount
NAME SECRETS AGE
airflow 1 2s
==> v1/StatefulSet
NAME READY AGE
airflow-worker 0/1 1s
==> v1beta1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
airflow-postgresql 0/1 1 0 1s
==> v1beta1/PodDisruptionBudget
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
airflow-pdb N/A 1 0 2s
==> v1beta1/Role
NAME AGE
airflow 2s
==> v1beta1/RoleBinding
NAME AGE
airflow 2s
==> v1beta2/StatefulSet
NAME READY AGE
airflow-redis-master 0/1 1s
NOTES:
Congratulations. You have just deployed Apache Airflow
export POD_NAME=$(kubectl get pods --namespace airflow -l "component=web,app=airflow" -o jsonpath="{.items[0].metadata.name}")
echo http://127.0.0.1:8080
kubectl port-forward --namespace airflow $POD_NAME 8080:8080
2. Open Airflow in your web browser