Recovery of crashed long running Spark jobs on Kubernetes

12/13/2019

I am currently using spark-on-k8s-operator to deploy Spark jobs in my kubernetes cluster and I was wondering about the following:

If the Spark driver pod crashes during a long running Spark job, is there any way to recover the progress made by this job? Since the executors are automatically killed when the driver crashes, that seems impossible.

-- toerq
apache-spark
kubernetes
pyspark
spark-streaming

0 Answers