How to Spin up a cloud proxy in cloud composer cluster

4/3/2019

How to spin up cloud proxy for cloud composer cluster

Currently we use airflow to manage jobs and dynamic DAG creation. For this, one separate Dag is written to check database table in PostgreSQL for existing rules & if rule is active/inactive in PostgreSQL, we manually have set up to off/on dynamic DAGs in Airflow.Now, we are going to use Google's self managed Cloud Composer but problem is that we don't have access of db of cloud composer. How can we use cloud sql proxy to resolve this problem?

-- Aniruddha Dwivedi
cloud-sql-proxy
google-cloud-composer
google-cloud-platform
kubernetes

1 Answer

4/3/2019

The Cloud Composer database is actually already accessible, because there is a Cloud SQL Proxy running within the environment's attached GKE cluster. You can use its service name airflow-sqlproxy-service to connect to it from within the cluster, using root. For example, on Composer 1.6.0, and if you have Kubernetes cluster credentials, you can list running pods:

$ kubectl get po --all-namespaces
composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-database-init-job-kprd5                                  0/1     Completed   0          1d
composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-scheduler-78d889459b-254fm                               2/2     Running     18         1d
composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-worker-569bc59df5-x6jhl                                  2/2     Running     5          1d
composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-worker-569bc59df5-xxqk7                                  2/2     Running     5          1d
composer-1-6-0-airflow-1-9-0-6f89fdb7   airflow-worker-569bc59df5-z5lnj                                  2/2     Running     5          1d
default                                 airflow-redis-0                                                  1/1     Running     0          1d
default                                 airflow-sqlproxy-668fdf6c4-vxbbt                                 1/1     Running     0          1d
default                                 composer-agent-6f89fdb7-0a7a-41b6-8d98-2dbe9f20d7ed-j9d4p        0/1     Completed   0          1d
default                                 composer-fluentd-daemon-g9mgg                                    1/1     Running     326        1d
default                                 composer-fluentd-daemon-qgln5                                    1/1     Running     325        1d
default                                 composer-fluentd-daemon-wq5z5                                    1/1     Running     326        1d

You can see that one of the worker pods is named airflow-worker-569bc59df5-x6jhl, and is running in the namespace composer-1-6-0-airflow-1-9-0-6f89fdb7. If I SSH to one of them and run the MySQL CLI, I have access to the database:

$ kubectl exec \
    -it airflow-worker-569bc59df5-x6jhl \
    --namespace=composer-1-6-0-airflow-1-9-0-6f89fdb7 -- \
      mysql \
        -u root \
        -h airflow-sqlproxy-service.default

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 27147
Server version: 5.7.14-google-log (Google)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

TL;DR for anything running in your DAGs, connect using root@airflow-sqlproxy-service.default with no password. This will connect to the Airflow metadata database through the Cloud SQL Proxy that's already running in your Composer environment.


If you need to connect to a database that isn't the Airflow database running in Cloud SQL, then you can spin up another proxy by deploying a new proxy pod into GKE (like you would deploy anything else into a Kubernetes cluster).

-- hexacyanide
Source: StackOverflow