airflow kubernetes not reading pod_template_file

3/2/2020

I am running Airflow with k8s executor.

I have everything set up under the [kubernetes] section and things are working fine. However, I would prefer to use a pod file for the worker.

So I generated a pod.yaml from one of the worker container that spins up.

I have placed this file on a location accessible by the scheduler pod something like

/opt/airflow/yamls/workerpod.yaml

But when I try to specify this file in pod_template_file parameter, it gives me these errors

[2020-03-02 22:12:24,115] {pod_launcher.py:84} ERROR - Exception when attempting to create Namespaced Pod.
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/airflow/contrib/kubernetes/pod_launcher.py", line 81, in run_pod_async
    resp = self._client.create_namespaced_pod(body=req, namespace=pod.namespace, **kwargs)
  File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 6115, in create_namespaced_pod
    (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)
  File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 6206, in create_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 334, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 168, in __call_api
    _request_timeout=_request_timeout)
  File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 377, in request
    body=body)
  File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 266, in POST
    body=body)
  File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ab2bc6dc-96f9-4014-8a08-7dae6e008aad', 'Cache-Control': 'no-store', 'Content-Type': 'application/json', 'Date': 'Mon, 02 Mar 2020 22:12:24 GMT', 'Content-Length': '660'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"examplebashoperatorrunme0-c9ca5d619bc54bf2a456e133ad79dd00\" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{0}: 0 is not an allowed group spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000040000, 1000049999] spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden]","reason":"Forbidden","details":{"name":"examplebashoperatorrunme0-c9ca5d619bc54bf2a456e133ad79dd00","kind":"pods"},"code":403}
[2020-03-02 22:12:24,141] {kubernetes_executor.py:863} WARNING - ApiException when attempting to run task, re-queueing. Message: pods "examplebashoperatorrunme0-c9ca5d619bc54bf2a456e133ad79dd00" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{0}: 0 is not an allowed group spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000040000, 1000049999] spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden]

Just to clarify, the pod.yaml file is generated from same running container that comes from configs in kubernetes section of airflow.cfg that works just fine. The run as user is correct. The SA is correct but still I am getting this error.

I am unsure if I should place this file in relation to where I kick off my kubectl apply ? Since it goes in the airflow.cfg, I didn't think that would be the case but rather should be accessible from within the scheduler container.

One strange thing I noticed is that even though I have specified and seem to be using KubernetesExecutor but when the individual worker pods come on they said LocalExecutor. That's something I had changed in the workerpod.yaml file to KubernetesExecutor.

here is pod yaml file

apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: nonroot
  labels:
    app: airflow-worker
    kubernetes_executor: "True"
  name: airflow-worker
#  namespace: airflow
spec:
  affinity: {}
  containers:
    env:
    - name: AIRFLOW_HOME
      value: /opt/airflow
    - name: AIRFLOW__CORE__EXECUTOR
      value: KubernetesExecutor 
      #value: LocalExecutor
    - name: AIRFLOW__CORE__DAGS_FOLDER
      value: /opt/airflow/dags
    - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
      valueFrom:
        secretKeyRef:
          key: MYSQL_CONN_STRING
          name: db-secret
    image: ourrepo.example.com/airflow-lab:latest
    imagePullPolicy: IfNotPresent
    name: base
#    resources:
#      limits:
#        cpu: "1"
#        memory: 1Gi
#      requests:
#        cpu: 400m
#        memory: 1Gi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
    volumeMounts:
    - mountPath: /opt/airflow/dags
      name: airflow-dags
      readOnly: true
      subPath: airflow/dags
    - mountPath: /opt/airflow/logs
      name: airflow-logs
    - mountPath: /opt/airflow/airflow.cfg
      name: airflow-config
      readOnly: true
      subPath: airflow.cfg
#    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
#      name: airflow-cluster-access-token-5228g
#      readOnly: true
  dnsPolicy: ClusterFirst
#  imagePullSecrets:
#  - name: airflow-cluster-access-dockercfg-85twh
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
#    fsGroup: 0
    runAsUser: 1001
    seLinuxOptions:
      level: s0:c38,c12
  serviceAccount: airflow-cluster-access
  serviceAccountName: airflow-cluster-access
#  tolerations:
#  - effect: NoSchedule
#    key: node.kubernetes.io/memory-pressure
#    operator: Exists
  volumes:
  - name: airflow-dags
    persistentVolumeClaim:
      claimName: ucdagent
  - emptyDir: {}
    name: airflow-logs
  - configMap:
      defaultMode: 420
      name: airflow-config
    name: airflow-config
#  - name: airflow-cluster-access-token-5228g
#    secret:
#      defaultMode: 420
#      secretName: airflow-cluster-access-token-5228g

Here is the working kubernetes config from airflow.cfg

    [kubernetes]
    #pod_template_file = /opt/airflow/yamls/workerpod.yaml
    dags_in_image = False
    worker_container_repository = ${AIRFLOW_IMAGE_NAME}
    worker_container_tag = ${AIRFLOW_IMAGE_TAG}
    worker_container_image_pull_policy = IfNotPresent
    delete_worker_pods = False
    in_cluster = true
    namespace = ${AIRFLOW_NAMESPACE}
    airflow_configmap = airflow-config
    run_as_user = 1001

    dags_volume_subpath = airflow/dags
    dags_volume_claim = ucdagent
    worker_service_account_name = airflow-cluster-access

    [kubernetes_secrets]
    AIRFLOW__CORE__SQL_ALCHEMY_CONN = db-secret=MYSQL_CONN_STRING

UPDATE: my airflow version is 1.10.7. I am guessing this is a newer parameters. I am trying to find if this is currently an empty config reference or it has been implemented in latest which is right now 1.10.9

UPDATE: This parameter has not beeen implemented as of 1.10.9

-- John Test
airflow
kubernetes
openshift

0 Answers