How to run Spark client mode with node auto scaling

9/3/2019

I have a requirement which i cannot find a suitable answer to.

We have a desktop application that we wish to burst calculations to Spark. The requirement is that the desktop application behaves more like a notebook in the sense it maintains a context with the cluster, so for example, we can reuse cached dataframes for efficient recalculation.

So the concept of spark-submit or REST API to submit Spark jobs is not appropriate for this use case. I had thought a suitable design would be to create a web server in the same VNET as Spark and use that web server as my driver node, creating a SparkSession from there and servicing user requests via web API.

The idea is the UI sends the web server a model definition, the web server turns that into a spark graph and iteratively the user can process nodes of that graph, modify them, recalculate and extend the graph all without having to start from scratch - so like a notebook and NOT a job.

However I cannot find a way to reliable have that design but also have an auto scaled cluster (e.g Azure AKS node auto scaler):

Databricks does not support client mode
I cant find any evidence HDInsight Spark does either
Spark on Kubernetes does however Spark Kubernetes does not support dynamic allocation at this time
Even if Spark Kubernetes did support dynamic allocation I believe I still wouldnt get the required node scale down as the shuffle service would prevent the node being deallocated

So what can i do?

I understand Databricks have both a REST API and Databricks-connect which could work however I dont want to be tied into a Databricks ecosystem. Similarly Apache Livy and Job-Server are not fit for purpose as they are focused more on job submission.

What I am after is a Spark cluster environment I can have a long running driver process but scaling workers based on demand. I am not married to the idea of client mode or having a sparksession remote to the cluster but I am assuming this is the only way?

-- user1371314

apache-spark

azure

azure-databricks

kubernetes

pyspark

K
Q

How to run Spark client mode with node auto scaling

Similar Questions

0 Answers

KQ

How to run Spark client mode with node auto scaling

Similar Questions

0 Answers

K
Q