I followed the Spark on Kubernetes blog but got to a point where it runs the job but fails inside the worker pods with an file access error.
2018-05-22 22:20:51 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, 172.17.0.15, executor 3): java.nio.file.AccessDeniedException: ./spark-examples_2.11-2.3.0.jar
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.copyFile(UnixCopyFile.java:243)
at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:581)
at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
at java.nio.file.Files.copy(Files.java:1274)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$copyRecursive(Utils.scala:632)
at org.apache.spark.util.Utils$.copyFile(Utils.scala:603)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:478)
at org.apache.spark.executor.Executor$anonfun$org$apache$spark$executor$Executor$updateDependencies$5.apply(Executor.scala:755)
at org.apache.spark.executor.Executor$anonfun$org$apache$spark$executor$Executor$updateDependencies$5.apply(Executor.scala:747)
at scala.collection.TraversableLike$WithFilter$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$updateDependencies(Executor.scala:747)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The command i use to run the SparkPi example is :
$DIR/$SPARKVERSION/bin/spark-submit \
--master=k8s://https://192.168.99.101:8443 \
--deploy-mode=cluster \
--conf spark.executor.instances=3 \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.container.image=172.30.1.1:5000/myapp/spark-docker:latest \
--conf spark.kubernetes.namespace=$namespace \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.driver.pod.name=spark-pi-driver \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
On working through the code it seems like the spark jar files are being copied to an internal location inside the container. But:
RBAC has been setup as follows: (oc get rolebinding -n myapp
)
NAME ROLE USERS GROUPS SERVICE ACCOUNTS SUBJECTS
admin /admin developer
spark-role /edit spark
And the service account (oc get sa -n myapp
)
NAME SECRETS AGE
builder 2 18d
default 2 18d
deployer 2 18d
pusher 2 13d
spark 2 12d
Or am i doing something silly here?
My kubernetes system is running inside Docker Machine (via virtualbox on osx) I am using:
Any hints on solving this greatly appreciated?
I know this is an 5m old post, but it looks that there's not enough information related to this issue around, so I'm posting my answer in case it can help someone.
It looks like you are not running the process inside the container as root, if that's the case you can take a look at this link (https://github.com/minishift/minishift/issues/2836).
Since it looks like you are also using openshift you can do:
oc adm policy add-scc-to-user anyuid -z spark-sa -n spark
In my case I'm using kubernetes and I need to use runAsUser:XX. Thus I gave group read/write access to /opt/spark inside the container and that solved the issue, just add the following line to resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile.
RUN chmod g+rwx -R /opt/spark
Of course you have to re-build the docker images manually or using the provided script like shown below.
./bin/docker-image-tool.sh -r YOUR_REPO -t YOUR_TAG build
./bin/docker-image-tool.sh -r YOUR_REPO -t YOUR_TAG push