Airflow kubernetes executor: Run 2 jobs on the same pod

6/11/2019

I'm using Airflow with kubernetes executor and the KubernetesPodOperator. I have two jobs:

  • A: Retrieve data from some source up to 100MB
  • B: Analyze the data from A.

In order to be able to share the data between the jobs, I would like to run them on the same pod, and then A will write the data to a volume, and B will read the data from the volume.

The documentation states:

The Kubernetes executor will create a new pod for every task instance.

Is there any way to achieve this? And if not, what recommended way there is to pass the data between the jobs?

-- matanper
airflow
kubernetes

4 Answers

6/14/2019

You can have 2 separate tasks A and B where data can be handed of from A to B. K8S has out of box support for such type of volumes. E.g. https://kubernetes.io/docs/concepts/storage/volumes/#awselasticblockstore. Here data will be generated by one pod will be persistent so when the pod gets deleted data won't be lost. The same volume can be mounted by another pod and can access the data.

-- sdvd
Source: StackOverflow

6/11/2019

Sorry this isn't possible - one job per pod.

You are best to use task 1 to put the data in a well known location (e.g in a cloud bucket) and get it from the second task. Or just combine the two tasks.

-- eamon1234
Source: StackOverflow

6/12/2019

You can absolutely accomplish this using subdags and the SubDag operator. When you start a subdag the kubernetes executor creates one pod at the subdag level and all subtasks run on that pod.

This behavior does not seem to be documented. We just discovered this recently when troubleshooting a process.

-- trejas
Source: StackOverflow

6/12/2019

yes you can do that using init containers inside job so in the same pod the job will not trigger before the init containers complete its task

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
  - name: init-mydb
    image: busybox:1.28
    command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']

this an example for pod and you can apply the same for kind job

-- Semah Mhamdi
Source: StackOverflow