Unable to read local files in spark kubernetes cluster mode

11/26/2019

I am facing an issue while reading a file stored in my system in spark cluster mode program.It is giving me an error that "File not found" but file is present at defined location.Please suggest me some idea so that i can read local file in spark cluster using kubernetes.

-- harshit saxena
apache-spark
file
kubernetes
scala
server

1 Answer

11/26/2019

You cannot refer local files on your machine when you submit Spark on Kubernetes.

The available solutions for your case might be:

  • Use Resource staging server. Is not available in the main branch of Apache Spark codebase, so the whole integration is on your side.
  • Put your file to the http/hdfs accessible location: refer docs
  • Put your file inside Spark Docker image and refer it as local:///path/to/your-file.jar

If you are running local Kubernetes cluster like Minikube you can also create a Kubernetes Volume with files you are interested in and mount it to the Spark Pods: refer docs. Be sure to mount that volume to both Driver and Executors.

-- Aliaksandr Sasnouskikh
Source: StackOverflow