I have a trivially small Spark application written in Java that I am trying to run in a K8s cluster using spark-submit
. I built an image with Spark binaries, my uber-JAR file with all necessary dependencies (in /opt/spark/jars/my.jar
), and a config file (in /opt/spark/conf/some.json
).
In my code, I start with
SparkSession session = SparkSession.builder()
.appName("myapp")
.config("spark.logConf", "true")
.getOrCreate();
Path someFilePath = FileSystems.getDefault().getPath("/opt/spark/conf/some.json");
String someString = new String(Files.readAllBytes(someFilePath));
and get this exception at readAllBytes
from the Spark driver:
java.nio.file.NoSuchFileException: /opt/spark/conf/some.json
If I run my Docker image manually I can definitely see the file /opt/spark/conf/some.json
as I expect. My Spark job runs as root so file permissions should not be a problem.
I have been assuming that, since the same Docker image, with the file indeed present, will be used to start the driver (and executors, but I don't even get to that point), the file should be available to my application. Is that not so? Why wouldn't it see the file?
You seem to get this exception from one of your worker nodes, not from the container.
Make sure that you've specified all files needed as --files
option for spark-submit
.
spark-submit --master yarn --deploy-mode cluster --files <local files dependecies> ...
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management