How to spin up cloud proxy for cloud composer cluster
Currently we use airflow to manage jobs and dynamic DAG creation. For this, one separate Dag is written to check database table in PostgreSQL for existing rules & if rule is active/inactive in PostgreSQL, we manually have set up to off/on dynamic DAGs in Airflow.Now, we are going to use Google's self managed Cloud Composer but problem is that we don't have access of db of cloud composer. How can we use cloud sql proxy to resolve this problem?
The Cloud Composer database is actually already accessible, because there is a Cloud SQL Proxy running within the environment's attached GKE cluster. You can use its service name airflow-sqlproxy-service
to connect to it from within the cluster, using root
. For example, on Composer 1.6.0, and if you have Kubernetes cluster credentials, you can list running pods:
$ kubectl get po --all-namespaces
composer-1-6-0-airflow-1-9-0-6f89fdb7 airflow-database-init-job-kprd5 0/1 Completed 0 1d
composer-1-6-0-airflow-1-9-0-6f89fdb7 airflow-scheduler-78d889459b-254fm 2/2 Running 18 1d
composer-1-6-0-airflow-1-9-0-6f89fdb7 airflow-worker-569bc59df5-x6jhl 2/2 Running 5 1d
composer-1-6-0-airflow-1-9-0-6f89fdb7 airflow-worker-569bc59df5-xxqk7 2/2 Running 5 1d
composer-1-6-0-airflow-1-9-0-6f89fdb7 airflow-worker-569bc59df5-z5lnj 2/2 Running 5 1d
default airflow-redis-0 1/1 Running 0 1d
default airflow-sqlproxy-668fdf6c4-vxbbt 1/1 Running 0 1d
default composer-agent-6f89fdb7-0a7a-41b6-8d98-2dbe9f20d7ed-j9d4p 0/1 Completed 0 1d
default composer-fluentd-daemon-g9mgg 1/1 Running 326 1d
default composer-fluentd-daemon-qgln5 1/1 Running 325 1d
default composer-fluentd-daemon-wq5z5 1/1 Running 326 1d
You can see that one of the worker pods is named airflow-worker-569bc59df5-x6jhl
, and is running in the namespace composer-1-6-0-airflow-1-9-0-6f89fdb7
. If I SSH to one of them and run the MySQL CLI, I have access to the database:
$ kubectl exec \
-it airflow-worker-569bc59df5-x6jhl \
--namespace=composer-1-6-0-airflow-1-9-0-6f89fdb7 -- \
mysql \
-u root \
-h airflow-sqlproxy-service.default
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 27147
Server version: 5.7.14-google-log (Google)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
TL;DR for anything running in your DAGs, connect using root@airflow-sqlproxy-service.default
with no password. This will connect to the Airflow metadata database through the Cloud SQL Proxy that's already running in your Composer environment.
If you need to connect to a database that isn't the Airflow database running in Cloud SQL, then you can spin up another proxy by deploying a new proxy pod into GKE (like you would deploy anything else into a Kubernetes cluster).