file access error running spark on kubernetes

5/22/2018

I followed the Spark on Kubernetes blog but got to a point where it runs the job but fails inside the worker pods with an file access error.

2018-05-22 22:20:51 WARN  TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, 172.17.0.15, executor 3): java.nio.file.AccessDeniedException: ./spark-examples_2.11-2.3.0.jar
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.copyFile(UnixCopyFile.java:243)
at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:581)
at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
at java.nio.file.Files.copy(Files.java:1274)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$copyRecursive(Utils.scala:632)
at org.apache.spark.util.Utils$.copyFile(Utils.scala:603)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:478)
at org.apache.spark.executor.Executor$anonfun$org$apache$spark$executor$Executor$updateDependencies$5.apply(Executor.scala:755)
at org.apache.spark.executor.Executor$anonfun$org$apache$spark$executor$Executor$updateDependencies$5.apply(Executor.scala:747)
at scala.collection.TraversableLike$WithFilter$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$updateDependencies(Executor.scala:747)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

The command i use to run the SparkPi example is :

$DIR/$SPARKVERSION/bin/spark-submit \
--master=k8s://https://192.168.99.101:8443 \
--deploy-mode=cluster \
--conf spark.executor.instances=3 \
--name spark-pi  \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.container.image=172.30.1.1:5000/myapp/spark-docker:latest \
--conf spark.kubernetes.namespace=$namespace \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.driver.pod.name=spark-pi-driver \
 local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar

On working through the code it seems like the spark jar files are being copied to an internal location inside the container. But:

  1. Should this happen since they are local and are already there
  2. If the do need to be copied to another location in the container how do i make this part of the container writable since it is created by the master node.

RBAC has been setup as follows: (oc get rolebinding -n myapp)

NAME                     ROLE                    USERS       GROUPS                         SERVICE ACCOUNTS   SUBJECTS
admin                    /admin                  developer                                                     
spark-role               /edit                                                              spark         

And the service account (oc get sa -n myapp)

NAME       SECRETS   AGE
builder    2         18d
default    2         18d
deployer   2         18d
pusher     2         13d
spark      2         12d

Or am i doing something silly here?

My kubernetes system is running inside Docker Machine (via virtualbox on osx) I am using:

  • openshift v3.9.0+d0f9aed-12
  • kubernetes v1.9.1+a0ce1bc657

Any hints on solving this greatly appreciated?

-- Ben
apache-spark
kubernetes
openshift

1 Answer

11/7/2018

I know this is an 5m old post, but it looks that there's not enough information related to this issue around, so I'm posting my answer in case it can help someone.

It looks like you are not running the process inside the container as root, if that's the case you can take a look at this link (https://github.com/minishift/minishift/issues/2836).

Since it looks like you are also using openshift you can do:

oc adm policy add-scc-to-user anyuid -z spark-sa -n spark

In my case I'm using kubernetes and I need to use runAsUser:XX. Thus I gave group read/write access to /opt/spark inside the container and that solved the issue, just add the following line to resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile.

RUN chmod g+rwx -R /opt/spark

Of course you have to re-build the docker images manually or using the provided script like shown below.

./bin/docker-image-tool.sh -r YOUR_REPO  -t YOUR_TAG build
./bin/docker-image-tool.sh -r YOUR_REPO -t YOUR_TAG  push
-- Carlos Rocha
Source: StackOverflow