I am setting up some ETL pipelines on Googles Composer Airflow, deployed on a 3 node GKE. The minimum for Airflow Compose from GCP!
Version : 1.10.1-composer
GCP Image version: composer-1.6.0-airflow-1.10.1
I would normally log into airflow machine and try to debug via Ipython but this is difficult to do on GKE setup. I cant seem to ssh into the proper place to run interactive tests to debug.
Python Operator: Using standard GSheetHook
def pull_sheet(execution_date=None):
hook = GSheetHook()
sheet_data = hook.get_values_df('SHEET_ID_XXXXX',
'EXAMPLEXXXXX!A1:J4305', shape_column=None)
return print(sheet_data)
STALE State in Airflow Logs. The Task has been left for a day (24 hours) with no Timeout Error or any error at all, never marked for retry. The snippet below and screenshot of scheduler logs is the only information i have about the running tasks. From the scheduler logs and it looks like the task keeps running without acknowledging any state changes... Airflow Logs
[2019-10-21 13:15:07,431] {models.py:1361} INFO - Dependencies all met for <TaskInstance: gsheet_test.pull_gsheet 2019-10-20T02:00:00+00:00 [queued]>
[2019-10-21 13:15:07,441] {models.py:1361} INFO - Dependencies all met for <TaskInstance: gsheet_test.pull_gsheet 2019-10-20T02:00:00+00:00 [queued]>
[2019-10-21 13:15:07,442] {models.py:1573} INFO -
-------------------------------------------------------------------------------
Starting attempt 1 of
-------------------------------------------------------------------------------
[2019-10-21 13:15:07,490] {models.py:1595} INFO - Executing <Task(PythonOperator): pull_gsheet> on 2019-10-20T02:00:00+00:00
[2019-10-21 13:15:07,491] {base_task_runner.py:118} INFO - Running: ['bash', '-c', 'airflow run gsheet_test pull_gsheet 2019-10-20T02:00:00+00:00 --job_id 70970 --raw -sd DAGS_FOLDER/gsheet_test.py --cfg_path /tmp/tmp3xukhrnx']
Any help is appreciated!!