I am working on the deployment of a gitlab CI pipeline to trigger a google cloud composer DAG Below is the .yaml I wrote :
stages:
- deploy
deploy:
stage: deploy
image: google/cloud-sdk
script:
- apt-get update && apt-get --only-upgrade install kubectl google-cloud-sdk
- gcloud config set project $GCP_PROJECT_ID
- gsutil cp plugins/*.py ${PLUGINS_BUCKET}
- gsutil cp dags/*.py ${DAGS_BUCKET}
- kubectl get pods
- gcloud composer environments run ${COMPOSER_ENVIRONMENT} --location ${ENVIRONMENT_LOCATION} trigger_dag -- ${DAG_NAME}
Unfortunately, the execution of the pipleine fails with the error below :
$ gcloud config set project $GCP_PROJECT_ID
Updated property [core/project].
$ gsutil cp plugins/*.py ${PLUGINS_BUCKET}
Copying file://plugins/dataproc_custom_operators.py [Content-Type=text/x-python]...
/ [0 files][ 0.0 B/ 2.3 KiB]
/ [1 files][ 2.3 KiB/ 2.3 KiB]
Operation completed over 1 objects/2.3 KiB.
$ gsutil cp dags/*.py ${DAGS_BUCKET}
copying file://dags/frrm_infdeos_workflow.py [Content-Type=text/x-python]...
/ [0 files][ 0.0 B/ 3.3 KiB]
/ [1 files][ 3.3 KiB/ 3.3 KiB]
Operation completed over 1 objects/3.3 KiB.
$ gcloud composer environments run ${COMPOSER_ENVIRONMENT} --location ${ENVIRONMENT_LOCATION} trigger_dag -- ${DAG_NAME}
kubeconfig entry generated for europe-west1-nameenvironment-a5456e0c-gke.
ERROR: (gcloud.composer.environments.run) No running GKE pods found. If the environment was recently started, please wait and retry.
ERROR: Job failed: command terminated with exit code 1
Do you have any idea about how to fix this please ? Best regards
I had the same problem as @scalacode. For me, the solution was that the gitlab-runner was running in a different GCP Project than the Composer Environment, so it failed without specifying that error. Running a gitlab-runner in the same project as the Composer Environment fixed the issue.
It seems Composer is unable to retrieve information about the pods/GKE cluster. This could be for a number of reasons ranging from the GKE cluster not creating the nodes to the pods being in a crash loop.
I notice in the script you did not “get-credentials” to authenticate to the cluster. When running commands on a GKE cluster through CLI, traditionally you would first have to authenticate to the cluster first with command. To do this with composer:
gcloud composer environments describe ${COMPOSER_ENVIRONMENT} --location ${ENVIRONMENT_LOCATION} --format="get(config.gkeCluster)"
This will return something of the form: projects/PROJECT/zones/ZONE/clusters/CLUSTER Then run:
gcloud container clusters get-credentials ${CLUSTER} --zone ${ZONE}
Once you have authenticated to the cluster in the script, see if it is now able to complete. If not, try running kubectl get pods to see what is happening with the pods/if they exist.
If you see many pods restarting or generally not in the “running/completed” state, the issue could be with the pod configuration. If you don’t see pods at all, the deployment may have failed. Check the deployment with command kubectl get deployments.
The deployments airflow-scheduler, airflow-sqlproxy, & airflow-worker should be present. If those three deployments are not present, the environment was likely tampered with, & it would be easiest to make a new environment.