Data puller and data pusher in pod or job

6/11/2018

I am trying to write a data process unit in kubernetes.

For every process unit has a quite similar workflow:

  1. Puller pull data from object storage and mount an /input volume to container
  2. Processor run the code to process data in volume and output data to /output volume
  3. Pusher push data in /output volume to object storage again

So every pod or job must have a container as data pusher and data puller which is mentioned in here by shared volume. But how can i make the process as pull -> process -> push sequence?

Right now I can use volume share way to communication to make it work: first I can let puller start working and let data processor wait until it find a pull-finished.txt created. Then let the pusher start working when it find a process-finished.txt created. But this may have to force the data process container FROM some image or use some specific entrypoint which is not what I want. Is there a more elegant way to make this work?

-- aisensiy
kubernetes
pipeline
pod

1 Answer

6/12/2018

As already mentioned in the comments by Suresh Vishnoi and Janos Lenart, the best approach is to use Jobs for processing data from queue or input volume, and init-containers to have sequential steps to process the data.

Here is a good example of using init-containers from Kubernetes documentation:

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
  - name: init-mydb
    image: busybox
    command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']

Another good example you can find in the answer provided by Janos Lenart

-- VAS
Source: StackOverflow