How to allow an access to a Compute Engine VM in Airflow (Google Cloud Composer )

11/14/2018

I try to run a bash command in this pattern ssh user@host "my bash command" using BashOperator in Airflow. This works locally because I have my publickey in the target machine.

But I would like to run this command in Google Cloud Composer, which is Airflow + Google Kubernetes Engine. I understood that the Airflow's core program is running in 3 pods named according to this pattern airflow-worker-xxxxxxxxx-yyyyy.

A naive solution was to create an ssh keys for each pod and add it's public key to the target machine in Compute Engine. The solution worked until today, somehow my 3 pods have changed so my ssh keys are gone. It was definitely not the best solution.

I have 2 questions:

  • Why Google cloud composer have changed my pods ?
  • How can I resolve my issue ?
-- Ismail Addou
airflow
google-cloud-composer
google-kubernetes-engine

1 Answer

11/14/2018

Pods restarts are not specifics to Composer. I would say this is more related to kubernetes itself:

Pods aren’t intended to be treated as durable entities.

So in general pods can be restarted for different reasons, so you shouldn't rely on any changes that you make on them.

How can I resolve my issue ?

You can solve this taking into account that Cloud Composer creates a Cloud Storage bucket and links it to your environment. You can access the different folders of this bucket from any of your workers. So you could store your key (you can use only one key-pair) in "gs://bucket-name/data", which you can access through the mapped directory "/home/airflow/gcs/data". Docs here

-- VictorGGl
Source: StackOverflow