How to run a Spark Standalone master on Kubernetes that will use the Kubernetes Cluser Manager to start workers

9/12/2019

I have an application that currently uses Standalone Mode locally to use spark functionality via the SparkContext. We are not using spark-submit to upload our jobs, we are running our application in a container on kubernetes so we would like to take advantage of the dynamic scheduling that kubernetes provides to run the jobs.

We started out looking for a helm chart to create stand alone cluster running on kubernetes similar to how you would have run a standalone cluster on machines ( vms or actual machines ) a few years ago and came across the following

https://github.com/helm/charts/tree/master/stable/spark

Issues:

very old instances of spark
not using the containers provided by spark
this setup wastes a bunch of resources if you need to have large worker nodes reserved and running all the time regardless of your need

Next we started looking at the spark-operator approach here https://github.com/GoogleCloudPlatform/spark-on-k8s-operator

Issues:

Doesn't support the way we interact with spark, takes the approach that all the apps are standalone apps that are pushed to the cluster to run
No longstanding master that allows us to take advantage of cached resources in the cluster

Along this journey we discovered that spark now supports a kubernetes cluster manager ( similar to the way it does with yarn, mesos ) so we are looking that this might be the best approach, but this still does not provide a standalone master that would allow for the in memory caching. I have looked to see if there was a way that I could get the org.apache.spark.deploy.master.Master to start and use the org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager

So I guess what I'm trying to ask is does anyone have any experience in trying to run a Standalone Master, that would use the kubernetes backend such as "KubernetesClusterManager" in order to have the worker nodes dynamically created as pods and running executors while having a permanent Standalone Master that would allow a SparkContext to connect to it remotely in client mode.

-- Adam Carbone

apache-spark

kubernetes

K
Q

How to run a Spark Standalone master on Kubernetes that will use the Kubernetes Cluser Manager to start workers

Similar Questions

0 Answers

KQ

How to run a Spark Standalone master on Kubernetes that will use the Kubernetes Cluser Manager to start workers

Similar Questions

0 Answers

K
Q