Airflow/k8s: How do I correctly set permissions for DAGs stored in a persistent volume?

1/31/2019

I would like to provide DAGs to all Kubernetes airflow pods (web, scheduler, workers) via a persistent volume,

kubectl create -f pv-claim.yaml

pv-claim.yaml containing:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: airflow-pv-claim
  annotations:
    pv.beta.kubernetes.io/gid: "1000"
    pv.beta.kubernetes.io/uid: "1000"
spec:
  storageClassName: standard
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

The deployment command is then:

helm install --namespace my_name --name "airflow" stable/airflow --values ~my_name/airflow/charts/airflow/values.yaml

In the chart stable/airflow, values.yaml also allows for specification of persistence:

persistence:
  enabled: true
  existingClaim: airflow-pv-claim
  accessMode: ReadWriteMany
  size: 1Gi

But if I do

kubectl exec -it airflow-worker-0 -- /bin/bash
touch dags/hello.txt

I get a permission denied error.

I have tried hacking the airflow chart to set up an initContainer to chown dags/:

command: ["sh", "-c", "chown -R 1000:1000 /dags"]

which is working for all but the workers (because they are created by flower?), as suggested at https://serverfault.com/a/907160/464205

I have also seen talk of fsGroup etc. - see e.g. Kubernetes NFS persistent volumes permission denied

I am trying to avoid editing the airflow charts (which seems to require hacks to at least two deployments-*.yaml files, plus one other), but perhaps this is unavoidable.

Punchline:

What is the easiest way to provision DAGs through a persistent volume to all airflow pods running on Kubernetes, with the correct permissions?

See also:

Persistent volume atached to k8s pod group

Kubernetes NFS persistent volumes permission denied [not clear to me how to integrate this with the airflow helm charts]

Kubernetes - setting custom permissions/file ownership per volume (and not per pod) [non-detailed, non-airflow-specific]

-- jtlz2
airflow
kubernetes

1 Answer

2/1/2019

It turns out you do, I think, have to edit the airflow charts, by adding the following block in deployments-web.yaml and deployments-scheduler.yaml under spec.template.spec:

kind: Deployment
spec:
  template:
    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        fsUser: 1000

This allows one to get dags into airflow using e.g.

kubectl cp my_dag.py my_namespace/airflow-worker-0:/usr/local/airflow/dags/
-- jtlz2
Source: StackOverflow