Airflow fails to write logs to s3 (v1.10.9)

2/13/2020

I am trying to setup remote logging in Airflow stable/airflow helm chart on v1.10.9 I am using Kubernetes executor and puckel/docker-airflow image. here's my values.yaml file.

airflow:
  image:
     repository: airflow-docker-local
     tag: 1.10.9
  executor: Kubernetes
  service:
    type: LoadBalancer
  config:
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1.10.9
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
    AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
    AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
    AIRFLOW__KUBERNETES__NAMESPACE: airflow
    AIRFLOW__CORE__REMOTE_LOGGING: True
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "s3://xxx"
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "s3://aws_access_key_id:aws_secret_access_key@bucket"
    AIRFLOW__CORE__ENCRYPT_S3_LOGS: False
persistence:
  enabled: true
  existingClaim: ''
postgresql:
  enabled: true
workers:
  enabled: false
redis:
  enabled: false
flower:
  enabled: false

but my logs don't get exported to S3, all I get on UI is

*** Log file does not exist: /usr/local/airflow/logs/icp_job_dag/icp-kube-job/2019-02-13T00:00:00+00:00/1.log
*** Fetching from: http://icpjobdagicpkubejob-f4144a374f7a4ac9b18c94f058bc7672:8793/log/icp_job_dag/icp-kube-job/2019-02-13T00:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='icpjobdagicpkubejob-f4144a374f7a4ac9b18c94f058bc7672', port=8793): Max retries exceeded with url: /log/icp_job_dag/icp-kube-job/2019-02-13T00:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f511c883710>: Failed to establish a new connection: [Errno -2] Name or service not known'))

any one have more insights what could I be missing?

Edit: from @trejas's suggestion below. I created a separate connection and using that. here's what my airflow config in values.yaml look like

airflow:
  image:
     repository: airflow-docker-local
     tag: 1.10.9
  executor: Kubernetes
  service:
    type: LoadBalancer
  connections:
  - id: my_aws
    type: aws
    extra: '{"aws_access_key_id": "xxxx", "aws_secret_access_key": "xxxx", "region_name":"us-west-2"}'
  config:
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1.10.9
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
    AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
    AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
    AIRFLOW__KUBERNETES__NAMESPACE: airflow

    AIRFLOW__CORE__REMOTE_LOGGING: True
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://airflow.logs
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: my_aws
    AIRFLOW__CORE__ENCRYPT_S3_LOGS: False

I still have the same issue.

-- Asav Patel
airflow
kubernetes
kubernetes-helm

2 Answers

2/13/2020

Your remote log conn id needs to be an ID of a connection in the connections form/list. Not a connection string.

https://airflow.apache.org/docs/stable/howto/write-logs.html

https://airflow.apache.org/docs/stable/howto/connection/index.html

-- trejas
Source: StackOverflow

5/1/2020

I was running into the same issue and thought I'd follow up with what ended up working for me. The connection is correct but you need to make sure that the worker pods have the same environment variables:

airflow:
  image:
     repository: airflow-docker-local
     tag: 1.10.9
  executor: Kubernetes
  service:
    type: LoadBalancer
  connections:
  - id: my_aws
    type: aws
    extra: '{"aws_access_key_id": "xxxx", "aws_secret_access_key": "xxxx", "region_name":"us-west-2"}'
  config:
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1.10.9
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
    AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
    AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
    AIRFLOW__KUBERNETES__NAMESPACE: airflow

    AIRFLOW__CORE__REMOTE_LOGGING: True
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://airflow.logs
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: my_aws
    AIRFLOW__CORE__ENCRYPT_S3_LOGS: False
    AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: True
    AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOG_CONN_ID: my_aws
    AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://airflow.logs
    AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__ENCRYPT_S3_LOGS: False

I also had to set the fernet key for the workers (and in general) otherwise I get an invalid token error:

airflow:
  fernet_key: "abcdefghijkl1234567890zxcvbnmasdfghyrewsdsddfd="

  config:
    AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__FERNET_KEY: "abcdefghijkl1234567890zxcvbnmasdfghyrewsdsddfd="
-- ltken123
Source: StackOverflow