I have a spark application which i want to deploy on kubernetes using the GCP spark operatorhttps://github.com/GoogleCloudPlatform/spark-on-k8s-operator. <br /> I was able to run a spark application using command kubectl apply -f example.yaml but i want to use spark-submit commands.
There are few options mentione by https://github.com/big-data-europe/docker-spark which can use see if that solves your problem
kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:2.4.5-hadoop2.7 -- bash ./spark/bin/spark-shell --master spark://spark-master:7077 --conf spark.driver.host=spark-client
or
kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:2.4.5-hadoop2.7 -- bash ./spark/bin/spark-submit --class CLASS_TO_RUN --master spark://spark-master:7077 --deploy-mode client --conf spark.driver.host=spark-client URL_TO_YOUR_APP
There is no way to manipulate directly the spark-submit command that the spark operator generates when it translates the yaml configuration file to spark specific options and kubernetes resources. This is kind of the point of using the operator. It lets you use a yaml config file to run either a SparkApplication or a ScheduledSparkApplication like if it were a kubernetes resource. Most options can be set either with hadoop or spark config files in config maps or as command line arguments to the jvm in the driver and executor pods. I recommend to use this last approach in order to have more flexibility when it comes to fine tuning spark jobs