How to run Spark client mode with node auto scaling

9/3/2019

I have a requirement which i cannot find a suitable answer to.

We have a desktop application that we wish to burst calculations to Spark. The requirement is that the desktop application behaves more like a notebook in the sense it maintains a context with the cluster, so for example, we can reuse cached dataframes for efficient recalculation.

So the concept of spark-submit or REST API to submit Spark jobs is not appropriate for this use case. I had thought a suitable design would be to create a web server in the same VNET as Spark and use that web server as my driver node, creating a SparkSession from there and servicing user requests via web API.

The idea is the UI sends the web server a model definition, the web server turns that into a spark graph and iteratively the user can process nodes of that graph, modify them, recalculate and extend the graph all without having to start from scratch - so like a notebook and NOT a job.

However I cannot find a way to reliable have that design but also have an auto scaled cluster (e.g Azure AKS node auto scaler):

  • Databricks does not support client mode
  • I cant find any evidence HDInsight Spark does either
  • Spark on Kubernetes does however Spark Kubernetes does not support dynamic allocation at this time
  • Even if Spark Kubernetes did support dynamic allocation I believe I still wouldnt get the required node scale down as the shuffle service would prevent the node being deallocated

So what can i do?

I understand Databricks have both a REST API and Databricks-connect which could work however I dont want to be tied into a Databricks ecosystem. Similarly Apache Livy and Job-Server are not fit for purpose as they are focused more on job submission.

What I am after is a Spark cluster environment I can have a long running driver process but scaling workers based on demand. I am not married to the idea of client mode or having a sparksession remote to the cluster but I am assuming this is the only way?

-- user1371314
apache-spark
azure
azure-databricks
kubernetes
pyspark

0 Answers