I've deployed an Airflow instance on Kubernetes using the stable/airflow
helm chart. I slightly modified the puckel/docker-airflow
image to be able to install the Kubernetes executor. All tasks are now being executed successfully on our Kubernetes cluster, but the logs of these tasks are nowhere to be found.
I would like to upload the logs to our Azure Blob Storage account. I've configured my environment variables like this:
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER="wasb-airflow"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID="wasb_default"
AIRFLOW__CORE__REMOTE_LOGGING="True"
The wasb_default
connection includes a login and password for the Azure Blob Storage account. I've tested this connection using a WasbHook
and was able to delete a dummy file with success.
When I try to view the logs, this message is displayed:
*** Log file does not exist: /usr/local/airflow/logs/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log
*** Fetching from: http://examplepythonoperatorprintthecontext-4a6e6a1f11fd431f8c2a1dc081:8793/log/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='examplepythonoperatorprintthecontext-4a6e6a1f11fd431f8c2a1dc081', port=8793): Max retries exceeded with url: /log/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f34ecdbe990>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Any ideas on how to solve this problem?
Sorry for the late reply, I recently faced this issue and was able to solve it with this answer.
I have a working repo in my repo here, you can check it out if you want. This setup uses PV to store logs you can add the connection in airflow.yaml to send logs to the remote folder.
Found the solution. Increase the AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC
environment variable to something like 15.