Zeppelin with Kubernetes. SPARK_HOME is not specified in interpreter-setting for non-local mode

4/12/2019

I have a Spark cluster (Master + 2 Workers) in a Kubernetes cluster (Minikube).

I want to add Zeppelin in my k8s cluster and configure it to use my Spark cluster.

So I tried to do it using either the Zeppelin 0.8.1 image from apache/zeppelin, or another image built on Zeppelin 0.9.0-SNAPSHOT (still in develop)

I followed the official Zeppelin documentation (that requires a Zeppelin 0.9.0 at least, even though it has not been released yet ¯\_(ツ)_/¯ )

What I did :

  • Pulling Zeppelin docker image
  • Building Spark docker image
  • Downloading the zeppelin-server.yaml from the documentation
  • Editing it so that he have correct path to my local Spark image and Zeppelin image
  • kubectl apply -f (spark & zeppelin yaml files)

I then browse my Zeppelin Notebook, and try to run a small spark test to see if it works, but I get the following error :

java.lang.RuntimeException: SPARK_HOME is not specified in interpreter-setting for non-local mode, if you specify it in zeppelin-env.sh, please move that into interpreter setting 
    at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.setupPropertiesForSparkR(SparkInterpreterLauncher.java:181) 
    at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.buildEnvFromProperties(SparkInterpreterLauncher.java:63) 
    at org.apache.zeppelin.interpreter.launcher.StandardInterpreterLauncher.launch(StandardInterpreterLauncher.java:86) 
    at org.apache.zeppelin.interpreter.InterpreterSetting.createInterpreterProcess(InterpreterSetting.java:698) 
    at org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:63) 
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:110) 
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:163) 
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:131) 
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:290) 
    at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:402) 
    at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:75) 
    at org.apache.zeppelin.scheduler.Job.run(Job.java:172) 
    at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121) 
    at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:187) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
    at java.lang.Thread.run(Thread.java:748)

First of all, I see that the error come from the function setupPropertiesForSparkR(), even though I don't use Spark R.

But the main thing on which I am lost is that, since I use Zeppelin and Spark docker images, I have no idea how to set my SPARK_HOME and what value should it have.

Notes :

  • I use Spark 2.4.0
  • I also tried to build Zeppelin image manually, but using the source that are in development, the build fail)
-- Nakeuh
apache-spark
apache-zeppelin
docker
kubernetes

1 Answer

4/12/2019

You can configure environment variables using:

docker run --env SPARK_HOME=/path ...

Also you can create a volume with your Spark cluster

docker run --env SPARK_HOME=/pathInCluster -v /pathYourSparkCluster:/pathInCluster ...
-- Pablo López Gallego
Source: StackOverflow