I have a Spark cluster (Master + 2 Workers) in a Kubernetes cluster (Minikube).
I want to add Zeppelin in my k8s cluster and configure it to use my Spark cluster.
So I tried to do it using either the Zeppelin 0.8.1 image from apache/zeppelin, or another image built on Zeppelin 0.9.0-SNAPSHOT (still in develop)
I followed the official Zeppelin documentation (that requires a Zeppelin 0.9.0 at least, even though it has not been released yet ¯\_(ツ)_/¯ )
What I did :
I then browse my Zeppelin Notebook, and try to run a small spark test to see if it works, but I get the following error :
java.lang.RuntimeException: SPARK_HOME is not specified in interpreter-setting for non-local mode, if you specify it in zeppelin-env.sh, please move that into interpreter setting
at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.setupPropertiesForSparkR(SparkInterpreterLauncher.java:181)
at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.buildEnvFromProperties(SparkInterpreterLauncher.java:63)
at org.apache.zeppelin.interpreter.launcher.StandardInterpreterLauncher.launch(StandardInterpreterLauncher.java:86)
at org.apache.zeppelin.interpreter.InterpreterSetting.createInterpreterProcess(InterpreterSetting.java:698)
at org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:63)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:110)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:163)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:131)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:290)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:402)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:75)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:121)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:187)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
First of all, I see that the error come from the function setupPropertiesForSparkR()
, even though I don't use Spark R.
But the main thing on which I am lost is that, since I use Zeppelin and Spark docker images, I have no idea how to set my SPARK_HOME and what value should it have.
Notes :
You can configure environment variables using:
docker run --env SPARK_HOME=/path ...
Also you can create a volume with your Spark cluster
docker run --env SPARK_HOME=/pathInCluster -v /pathYourSparkCluster:/pathInCluster ...