Spark in Kubernetes container does not see local file

7/12/2020

I have a trivially small Spark application written in Java that I am trying to run in a K8s cluster using spark-submit. I built an image with Spark binaries, my uber-JAR file with all necessary dependencies (in /opt/spark/jars/my.jar), and a config file (in /opt/spark/conf/some.json).

In my code, I start with

SparkSession session = SparkSession.builder()
.appName("myapp")
.config("spark.logConf", "true")
.getOrCreate();

Path someFilePath = FileSystems.getDefault().getPath("/opt/spark/conf/some.json");
String someString = new String(Files.readAllBytes(someFilePath));

and get this exception at readAllBytes from the Spark driver:

java.nio.file.NoSuchFileException: /opt/spark/conf/some.json

If I run my Docker image manually I can definitely see the file /opt/spark/conf/some.json as I expect. My Spark job runs as root so file permissions should not be a problem.

I have been assuming that, since the same Docker image, with the file indeed present, will be used to start the driver (and executors, but I don't even get to that point), the file should be available to my application. Is that not so? Why wouldn't it see the file?

-- mustaccio
apache-spark
java
kubernetes
nosuchfileexception
spark-submit

1 Answer

7/13/2020

You seem to get this exception from one of your worker nodes, not from the container.

Make sure that you've specified all files needed as --files option for spark-submit.

spark-submit --master yarn --deploy-mode cluster --files <local files dependecies> ...

https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

-- andreoss
Source: StackOverflow