Submitting to a Spark Master inside a kubernetes cluster

4/4/2019

I have a Kubernetes cluster composed of only one VM (minikube cluster).

On this cluster, I have a Spark Master and two Workers running. I have set up the Ingress addon in the following way (My spark components use the default ports) :

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: minikube-ingress
  annotations:
spec:
  rules:
  - host: spark-kubernetes
    http:
      paths:
      - path: /web-ui
        backend:
          serviceName: spark-master
          servicePort: 8080
      - path: /
        backend:
          serviceName: spark-master
          servicePort: 7077

And I added my k8s IP in my /etc/hosts

[MINIKUBE_IP] spark-kubernetes

I am able to connect to the Master webui through http://spark-kubernetes/web-ui : enter image description here

I now want to submit a JAR stored on my local machine (the spark-examples for example). I expected this command to work :

./bin/spark-submit \
    --master spark://spark-kubernetes \
    --deploy-mode cluster \
    --class org.apache.spark.examples.SparkPi \
     ./examples/jars/spark-examples_2.11-2.4.0.jar

But I get the following error :

2019-04-04 08:52:36 WARN  SparkSubmit$anon$2:87 - Failed to load .
java.lang.ClassNotFoundException: 
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:810)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

What have I done wrong?

Note :

  • I know that with Spark 2.4 I can have a cluster without a Master and submit directly to the k8s, but I want to do it with a master for now
  • I use Spark 2.4
  • I use Kubernetes 1.14
-- Nakeuh
apache-spark
kubernetes

1 Answer

4/6/2019

To make it work, use either client mode that distributes the jars (--deploy-mode client) or specify the path to the jar file in the container image. So instead of using

./examples/jars/spark-examples_2.11-2.4.0.jar, use something like: /opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar (depending on the image you use)

also check my spark operator for K8s: https://github.com/radanalyticsio/spark-operator :)

-- Jiri Kremser
Source: StackOverflow