Running local pyspark script on spark cluster on kubernetes

6/23/2019

I have a spark cluster set-up on kubernetes and to run the spark-app.py script on spark, I:

  1. build and push an image with spark-app.py script
  2. run the spark-submit command below
./bin/spark-submit \
    --master k8s://https://<master-ip>:<port> \
    --deploy-mode cluster \
    --name spark-app \
    --conf spark.executor.instances=3 \
    --conf spark.kubernetes.container.image=my-repo/spark-py:v2.4.3 \
    --conf spark.kubernetes.namespace=default \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=<my-secret> \
    --conf spark.kubernetes.pyspark.pythonVersion=3 \
      local:///opt/spark/examples/src/main/python/spark-app.py

But this takes a lot of time as everytime I edit the script, I have to rebuild a new image.

Q1) How can I avoid re-building an image every time I edit just the script?

Q2) Is there a way so that spark-submit can accept script from my computer?

-- Ankur Gautam
apache-spark
kubernetes
pyspark

0 Answers