Running Spark2.3 on Kubernetes with remote dependency on S3

4/29/2018

I am running spark-submit to run on Kubernetes (Spark 2.3). My problem is that the InitContainer does not download my jar file if it's specified as an s3a:// path but does work if I put my jar on an HTTP server and use http://. The spark driver fails, of course, because it can't find my Class (and the jar file in fact is not in the image).

I have tried two approaches:

  1. specifying the s3a path to jar as the argument to spark-submit and
  2. using --jars to specify the jar file's location on s3a, but both fail in the same way.

edit: also, using local:///home/myuser/app.jar does not work with the same symptoms.

On a failed run (dependency on s3a), I logged into the container and found the directory /var/spark-data/spark-jars/ to be empty. The init-container logs don't indicate any type of error.

Questions:

  1. What is the correct way to specify remote dependencies on S3A?
  2. Is S3A not supported yet? Only http(s)?
  3. Any suggestions on how to further debug the InitContainer to determine why the download doesn't happen?
-- joshuarobinson
apache-spark
kubernetes

0 Answers