K
Q

How can I run spark-submit commands using the GCP spark operator on kubernetes

May 7, 2020

I have a spark application which i want to deploy on kubernetes using the GCP spark operatorhttps://github.com/GoogleCloudPlatform/spark-on-k8s-operator. <br /> I was able to run a spark application using command kubectl apply -f example.yaml but i want to use spark-submit commands.

-- Dhruv Singh Chandel
apache-spark
kubernetes
google-cloud-platform
google-kubernetes-engine
spark-submit

2 Answers

May 7, 2020

There are few options mentione by https://github.com/big-data-europe/docker-spark which can use see if that solves your problem

kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:2.4.5-hadoop2.7 -- bash ./spark/bin/spark-shell --master spark://spark-master:7077 --conf spark.driver.host=spark-client

or

kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:2.4.5-hadoop2.7 -- bash ./spark/bin/spark-submit --class CLASS_TO_RUN --master spark://spark-master:7077 --deploy-mode client --conf spark.driver.host=spark-client URL_TO_YOUR_APP
-- QuickSilver
Source: StackOverflow

June 20, 2020

There is no way to manipulate directly the spark-submit command that the spark operator generates when it translates the yaml configuration file to spark specific options and kubernetes resources. This is kind of the point of using the operator. It lets you use a yaml config file to run either a SparkApplication or a ScheduledSparkApplication like if it were a kubernetes resource. Most options can be set either with hadoop or spark config files in config maps or as command line arguments to the jvm in the driver and executor pods. I recommend to use this last approach in order to have more flexibility when it comes to fine tuning spark jobs

-- Pablo Flores
Source: StackOverflow