Spark-on-k8s:/opt/spark/bin/spark-class: line 71: /home/deploy/jdk1.8.0_201/bin/java: No such file or directory

6/23/2021

I followed the instructions for running spark-on-k8s on https://spark.apache.org/docs/3.1.1/running-on-kubernetes.html#content

After submitting the example which is to launch Spark Pi in cluster mode, the pod met an error and I can't understand why it happened.

This is the command line:

./bin/spark-submit \
--master k8s://https://2-120:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=ethanzhang1997/spark:3.1.1 \
local:///path/to/examples.jar

Here is the error: error information

I think this container should use java env in the image, but it tried to read the JAVA_HOME from my current computer.

Any help is important to me, with much thanks!


Now I temporarily solve this problem. I downloaded the corresponding version of jdk into spark dir and add the following lines to the Dockerfile which is to build the spark image:

RUN mkdir -p /home/deploy
ADD jdk-8u201-linux-x64.tar.gz /home/deploy/

ENV JAVA_HOME /home/deploy/jdk1.8.0_201
ENV JRE_HOME ${JAVA_HOME}/jre
ENV CLASSPATH .:${JAVA_HOME}/lib:${JRE_HOME}/lib
ENV PATH ${JAVA_HOME}/bin:$PATH

This makes sure that I have the same JAVA_HOME both in the image and host computer.

But there is still one thing I can't understand.The hadoop and spark env is also different between my host computer and image. Why this doesn't cause a problem? I noticed that there is a process to mount spark dir on the image, but how it works?

By the way, it seems that the offcial guidence on spark-on-kubernetes makes openjdk11 as default. But if user's JAVA_HOME is not set like this , there would be a problem, is it?

-- EthanZhang
apache-spark
kubernetes

1 Answer

3/10/2022

Faced the same issue recently & did a bit of testing to understand this behaviour. Below are my findings.

If there exists an export JAVA_HOME=\path\to\java command in spark-env.sh in the launch machine's (machine from which spark-submit to k8 is happening) $SPARK_HOME/conf folder, then that exact value is used as java home on both the launch machine, as well as inside the docker container (even if you set JAVA_HOME environment variable in your container). Tried multiple options to override the spark-env value inside the docker container, but to no avail.

However, if you remove the JAVA_HOME export command from the spark-env.sh, then the JAVA_HOME is taken from bash_profile (or if exported otherwise) in both the launch machine & the docker container.

-- ankush1377
Source: StackOverflow