How to let Google Cloud Composer (airflow) run jobs on a different kubernetes cluster?

9/2/2019

I want to have my Cloud Composer environment (Google Cloud's managed Apache Airflow service) start pods on a different kubernetes cluster. How should I do this?

Note that Cloud composer runs airflow on a kubernetes cluster. That cluster is considered to be the composer "environment". Using the default values for the KubernetesPodOperator, composer will schedule pods on its own cluster. However in this case, I have a different kubernetes cluster on which I want to run the pods.

I can connect to the worker pods and run a gcloud container clusters get-credentials CLUSTERNAME there, but every now and then the pods get recycled so this is not a durable solution.

I noticed that the KubernetesPodOperator has both an in_cluster and a cluster_context argument, which seem useful. I would expect that this would work:

pod = kubernetes_pod_operator.KubernetesPodOperator(
    task_id='my-task',
    name='name',
    in_cluster=False,
    cluster_context='my_cluster_context',
    image='gcr.io/my/image:version'
)

But this results in kubernetes.config.config_exception.ConfigException: Invalid kube-config file. Expected object with name CONTEXTNAME in kube-config/contexts list

Although if I run kubectl config get-contexts in the worker pods, I can see the cluster config listed.

So what I fail to figure out is:

  • how to make sure that the context for my other kubernetes cluster is available on the worker pods (or should that be on the nodes?) of my composer environment?
  • if the context is set (as I did manually for testing purposes), how can I tell airflow to use that context?
-- bartaelterman
airflow
google-cloud-composer
google-cloud-platform
kubernetes

1 Answer

9/2/2019

Check out the GKEPodOperator for this.

Example usage from the docs :

operator = GKEPodOperator(task_id='pod_op',
                          project_id='my-project',
                          location='us-central1-a',
                          cluster_name='my-cluster-name',
                          name='task-name',
                          namespace='default',
                          image='perl')
-- ECris
Source: StackOverflow