I followed the instructions for running spark-on-k8s on https://spark.apache.org/docs/3.1.1/running-on-kubernetes.html#content
After submitting the example which is to launch Spark Pi in cluster mode, the pod met an error and I can't understand why it happened.
This is the command line:
./bin/spark-submit \
--master k8s://https://2-120:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=ethanzhang1997/spark:3.1.1 \
local:///path/to/examples.jar
Here is the error: error information
I think this container should use java env in the image, but it tried to read the JAVA_HOME from my current computer.
Any help is important to me, with much thanks!
Now I temporarily solve this problem. I downloaded the corresponding version of jdk into spark dir and add the following lines to the Dockerfile which is to build the spark image:
RUN mkdir -p /home/deploy
ADD jdk-8u201-linux-x64.tar.gz /home/deploy/
ENV JAVA_HOME /home/deploy/jdk1.8.0_201
ENV JRE_HOME ${JAVA_HOME}/jre
ENV CLASSPATH .:${JAVA_HOME}/lib:${JRE_HOME}/lib
ENV PATH ${JAVA_HOME}/bin:$PATH
This makes sure that I have the same JAVA_HOME both in the image and host computer.
But there is still one thing I can't understand.The hadoop and spark env is also different between my host computer and image. Why this doesn't cause a problem? I noticed that there is a process to mount spark dir on the image, but how it works?
By the way, it seems that the offcial guidence on spark-on-kubernetes makes openjdk11 as default. But if user's JAVA_HOME is not set like this , there would be a problem, is it?
Faced the same issue recently & did a bit of testing to understand this behaviour. Below are my findings.
If there exists an
export JAVA_HOME=\path\to\java
command in spark-env.sh in the launch machine's (machine from which spark-submit to k8 is happening) $SPARK_HOME/conf folder, then that exact value is used as java home on both the launch machine, as well as inside the docker container (even if you set JAVA_HOME environment variable in your container). Tried multiple options to override the spark-env value inside the docker container, but to no avail.
However, if you remove the JAVA_HOME export command from the spark-env.sh, then the JAVA_HOME is taken from bash_profile (or if exported otherwise) in both the launch machine & the docker container.