Working on a POC for Spark over K8s (Spark ver 2.4.4).
I was able to invoke a spark job using spark-submit:
bin/spark-submit --master k8s://https://localhost:8443 --deploy-mode cluster --conf spark.executor.instances=5 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=liranbo/k8s-test4:latest --class org.apache.spark.examples.SparkPi --name spark-pi local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar 40
Trying to do the same programmatically using SparkSession builder.
SparkSession.builder().appName(appName).master("k8s://https://localhost:8443")
And I'm getting the following exception
org.apache.spark.SparkException: Could not parse Master URL: 'k8s://https://localhost:8443'
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$createTaskScheduler(SparkContext.scala:2784)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at SparkPi.startBySparkSessionBuilder(SparkPi.java:46)
at Runner.main(Runner.java:9)
Is it possible to set the master url to K8s when using SparkSession? if not why is supported by spark-submit?
BTW: I was able to start the app programmatically using SparkLauncher, but this is not the solution I'm after.
I saw a (possibly) working example in sparkPy.
just in case someone runs into the same issue - the issue is related to a missing jar. add the following:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-kubernetes_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>