I'm trying to execute spark jar in Kubernetes cluster and getting the following error
Local jar SparkWordCount.jar does not exist. ClassNotFoundException
I have found the issue - "SparkWordCount.jar" is not present in the docker image "gcr.io/spark-operator/spark:v2.4.5". I have to create a docker image with "SparkWordCount.jar". I just don't know to create a dockerfile to build the docker image with Spark jar. Can anybody help with that?
The following is my code
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-sales-driver4
namespace: default
spec:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v2.4.5"
imagePullPolicy: Always
mainClass: com.example.WordCount
mainApplicationFile: "/user/Sivaranjani/workspaces/SparkWordCount.jar"
sparkVersion: "2.4.5"
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 2.4.5
serviceAccount: spark
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 2.4.5
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
Executed below command:
kubectl create -f spark.yaml
Below is the error log:
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ SPARK_K8S_CMD=driver
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ sed 's/[^=]*=\(.*\)/\1/g'
+ sort -t_ -k4 -n
+ grep SPARK_JAVA_OPT_
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' '' == 2 ']'
+ '[' '' == 3 ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.6 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class com.example.WordCount spark-internal
20/03/12 05:51:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/03/12 05:51:36 WARN DependencyUtils: Local jar /user/Sivaranjani/workspaces/SparkWordCount.jar does not exist, skipping.
20/03/12 05:51:36 WARN SparkSubmit$anon$2: Failed to load com.example.WordCount.
java.lang.ClassNotFoundException: com.example.WordCount
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:806)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/03/12 05:51:36 INFO ShutdownHookManager: Shutdown hook called
20/03/12 05:51:36 INFO ShutdownHookManager: Deleting directory /tmp/spark-33e36019-a7b9-413b-8ab4-f411eae07fbc