How do I have my R file (being launched from a KubernetesPodOperator) see my kubernetes environment variables in airflow?

7/31/2019

PLEASE SCROLL ALL THE WAY DOWN

I have a dag running in airflow that launches three KubernetesPodOperators, with all of these operators launching R files. So, for the past few days, I've been having a ton of problems with R and having it see environment variables. I'm making my db username and password be environment variables for my R files to see, but it just can't see it no matter what I do.

What I've tried so far: At first, I made a ConfigMaps and a Secrets file and tried implementing that with my R file, but I was probably doing it completely wrong (if you haven't noticed, I'm super new to kubernetes and airflow). Afterwards, I tried placing the environment variables in the config section in the helm chart for my airflow docker container. That didn't work, so I tried placing the environment variables inside airflow.cfg directly, which also didn't work.

What I'm trying to do now: I figured I probably need to inject the KubernetesPodOperator in my dag to have my R files see these environment variables, so I tried making a kubernetes secrets file and placing it in my dag file so the pod that's launched can see it, but I'm not even sure if I'm doing it correctly. I tried doing what this guy did: Accessing Kubernetes Secret from Airflow KubernetesPodOperator but I keep getting a ContainerConfigError when the pod launches.

Also, if my pod can see the environment variables, then does that mean that the R file inside it can see them for sure, too?

This is how I placed the environment variables in airflow's helm chart config.

## Custom airflow configuration environment variables
  ## Use this to override any airflow setting settings defining environment variables in the
  ## following form: AIRFLOW__<section>__<key>.
  ## See the Airflow documentation: https://airflow.readthedocs.io/en/stable/howto/set-config.html?highlight=setting-configuration
  ## Example:
  ##   config:
  ##     AIRFLOW__CORE__EXPOSE_CONFIG: "True"
  ##     HTTP_PROXY: "http://proxy.mycompany.com:123"
  config: {
    ANOTHER_TEST: "MORE TESTING!!!",
    AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__FETCHDB_DB: "postgres",
    TEST: "testing!!!!!!!!!!!!"
  }

This is how I'm trying to inject the KubernetesPodOperator with secrets:

import datetime

from airflow import models
from airflow.contrib.operators import kubernetes_pod_operator
from airflow.contrib.kubernetes.volume import Volume
from airflow.contrib.kubernetes.volume_mount import VolumeMount
from airflow.contrib.kubernetes import secret

YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)

# Something with this is causing the fetch_db KubernetesPodOperator to fail
env_var_secret = secret.Secret(
    deploy_type='env',
    deploy_target='FETCHDB_USERNAME',
    secret='fetchdb-secret',
    key='username',
)

with models.DAG(
        dag_id='weekly_train',
        schedule_interval=datetime.timedelta(days=1),
        start_date=YESTERDAY) as dag:
    producer_id = 'zzz'
    run_id = dag.dag_id
    version = 1

    fetch_db = kubernetes_pod_operator.KubernetesPodOperator(
        task_id='fetch-db',
        name='fetch-db',
        namespace='airflow',
        image='localhost:32000/zz/z',
        image_pull_policy='Always',
        # image_pull_secrets='gcr',
        in_cluster=True,
        secrets=[env_var_secret],
        arguments=['Rscript', '/home/ruser/fetch_db.R', producer_id, '{{ run_id }}', str(version)]
    )

My secret seems fine:

root@ip-10-30-35-195:# kubectl describe secret/fetchdb-secret
Name:         fetchdb-secret
Namespace:    default
Labels:       <none>
Annotations:
Type:         Opaque

Data
====
password:  12 bytes
username:  5 bytes

The error that's causing my pod to fail:

root@ip-10-30-35-195:# kubectl logs -n airflow fetch-db-f226395c
Error from server (BadRequest): container "base" in pod "fetch-db-f226395c" is waiting to start: CreateContainerConfigError

Instead, the pod should run properly. If I comment out the secrets=... in the python file then the pod doesn't fail.

Also, the error that I get when I do kubectl get pods:

fetch-db-f226395c    0/1     CreateContainerConfigError     0      17m

What should I do?

EDIT: I did kubectl describe pod on a pod with the CreateContainerConfigError and this is what I got:

Name:         fetch-db-f226395c
Namespace:    airflow
Priority:     0
Node:         ip-10-30-35-195/10.30.35.195
Start Time:   Wed, 31 Jul 2019 15:00:33 +0000
Labels:       <none>
Annotations:  <none>
Status:       Pending
IP:           10.1.1.184
Containers:
  base:
    Container ID:
    Image:         localhost:32000/zz/z
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      Rscript
      /home/ruser/fetch_db.R
      zzz
      manual__2019-07-31T15:00:27.227397+00:00
      1
    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Environment:
      FETCHDB_USERNAME:  <set to the key 'username' in secret 'fetchdb-secret'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9nl85 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-9nl85:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9nl85
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason   Age                     From                      Message
  ----    ------   ----                    ----                      -------
  Normal  Pulling  3m47s (x683 over 153m)  kubelet, ip-10-30-35-195  Pulling image "localhost:32000/zz/z"

So, it seems like the secret was injected properly, but it's still causing an error?

EDIT 2: Seems like the issue was being caused by my Secret after all. My secret was in a different namespace compared to the namespace that my fetch-db pod was being created in.

EDIT 3: My last question, how exactly do I inject ConfigMaps into my KubernetesPodOperators? I made a ConfigMap and all that, but R can't seem to see those variables.

-- EMS
airflow
docker
kubernetes
r

0 Answers