Airflow Kubernetes Executor logs

11/29/2019

I've deployed an Airflow instance on Kubernetes using the stable/airflow helm chart. I slightly modified the puckel/docker-airflow image to be able to install the Kubernetes executor. All tasks are now being executed successfully on our Kubernetes cluster, but the logs of these tasks are nowhere to be found.

I would like to upload the logs to our Azure Blob Storage account. I've configured my environment variables like this:

AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER="wasb-airflow"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID="wasb_default"
AIRFLOW__CORE__REMOTE_LOGGING="True"

The wasb_default connection includes a login and password for the Azure Blob Storage account. I've tested this connection using a WasbHook and was able to delete a dummy file with success.

When I try to view the logs, this message is displayed:

*** Log file does not exist: /usr/local/airflow/logs/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log
*** Fetching from: http://examplepythonoperatorprintthecontext-4a6e6a1f11fd431f8c2a1dc081:8793/log/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='examplepythonoperatorprintthecontext-4a6e6a1f11fd431f8c2a1dc081', port=8793): Max retries exceeded with url: /log/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f34ecdbe990>: Failed to establish a new connection: [Errno -2] Name or service not known'))

Any ideas on how to solve this problem?

-- Wouter Goossens
airflow
azure
kubernetes
logging

2 Answers

4/4/2020

Sorry for the late reply, I recently faced this issue and was able to solve it with this answer.

I have a working repo in my repo here, you can check it out if you want. This setup uses PV to store logs you can add the connection in airflow.yaml to send logs to the remote folder.

-- midNight
Source: StackOverflow

12/2/2019

Found the solution. Increase the AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC environment variable to something like 15.

-- Wouter Goossens
Source: StackOverflow