Create SparkSession with master set to k8s

11/20/2019

Working on a POC for Spark over K8s (Spark ver 2.4.4).

I was able to invoke a spark job using spark-submit:

bin/spark-submit --master k8s://https://localhost:8443 --deploy-mode cluster --conf spark.executor.instances=5 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=liranbo/k8s-test4:latest --class org.apache.spark.examples.SparkPi --name spark-pi local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar 40

Trying to do the same programmatically using SparkSession builder.

SparkSession.builder().appName(appName).master("k8s://https://localhost:8443")

And I'm getting the following exception

org.apache.spark.SparkException: Could not parse Master URL: 'k8s://https://localhost:8443'
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$createTaskScheduler(SparkContext.scala:2784)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at SparkPi.startBySparkSessionBuilder(SparkPi.java:46)
at Runner.main(Runner.java:9)

Is it possible to set the master url to K8s when using SparkSession? if not why is supported by spark-submit?

BTW: I was able to start the app programmatically using SparkLauncher, but this is not the solution I'm after.

I saw a (possibly) working example in sparkPy.

-- LiranBo
apache-spark
kubernetes

1 Answer

12/30/2019

just in case someone runs into the same issue - the issue is related to a missing jar. add the following:

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-kubernetes_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
-- LiranBo
Source: StackOverflow