I am curious to know if beam dataflow job can be run with kubernetes. I can see lot of spring dataflow jobs run from kubernetes but not beam dataflow.
I can see one example like https://github.com/sanderploegsma/beam-scheduling-kubernetes/blob/master/kubernetes/cronjob.yml
But this doesn't explain how to pass args parameters like
args: ["--runner=DataflowRunner --project=$peoject --gcpTempLocation=$gcptemp"]
Expanding more on this from https://streambench.wordpress.com/2018/06/07/set-up-the-direct-runner-for-beam/
I want to deploy this part on kubernetes.
beam_app_direct:
container_name: "beam_direct_app"
image: "beam_direct_app"
build:
context: .
dockerfile: ./Dockerfile-direct
environment:
- YOUR_ENV_PARAMETER=42
- ANOTHER_ENV_PARAMETER=abc
- ...
links:
- ...
# volume:
# - ./your-beam-app /usr/src/your-beam-app
command: "bash ./init.sh"
but I do not get any idea, how it can be deployed.
Updating more details.
My Cronjob.yaml file
apiVersion: batch/v1
kind: Job
metadata:
name: "cronjob"
spec:
template:
spec:
containers:
- name: campaignjob
image: cronjob
build:
context: .
dockerfile: ./Dockerfile
command: "bash ./init.sh"
restartPolicy: Never
kubectl apply -f cronjob.yaml --validate=false
I am getting following error.
The Job "cronjob" is invalid: * spec.template.spec.containers: Required value * spec.template.spec.restartPolicy: Unsupported value: "Always": supported values: OnFailure, Never
Update: I am very surprised. I realised its a case of wrong YAML file only but even after 4 days there is not a comment. I even send this issue to Google team but they are asking me to use other technology.
From the github link you have provided, the job would have to run on the Master node. Within GKE, you do not have access to the Master node as it is a managed service.
I would suggest using Google Cloud Dataflow which is built to run the jobs you describe.