Spark for kubernetes - Azure Blob Storage credentials issue

10/18/2018

I'm having the following issue while trying to run Spark for kubernetes when the app jar is stored in an Azure Blob Storage container:

2018-10-18 08:48:54 INFO  DAGScheduler:54 - Job 0 failed: reduce at SparkPi.scala:38, took 1.743177 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 10.244.1.11, executor 2): org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1086)
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:538)
    at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1366)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3242)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3291)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3259)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:470)
    at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1897)
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:694)
    at org.apache.spark.util.Utils$.fetchFile(Utils.scala:476)
    at org.apache.spark.executor.Executor$anonfun$org$apache$spark$executor$Executor$updateDependencies$5.apply(Executor.scala:755)
    at org.apache.spark.executor.Executor$anonfun$org$apache$spark$executor$Executor$updateDependencies$5.apply(Executor.scala:747)
    at scala.collection.TraversableLike$WithFilter$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$updateDependencies(Executor.scala:747)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:863)
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1081)
    ... 24 more

The command I use to launch the job is:

/opt/spark/bin/spark-submit
    --master k8s://<my-k8s-master>
    --deploy-mode cluster
    --name spark-pi
    --class org.apache.spark.examples.SparkPi
    --conf spark.executor.instances=5
    --conf spark.kubernetes.container.image=<my-image-built-with-wasb>
    --conf spark.kubernetes.namespace=<my-namespace>
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
    --conf spark.kubernetes.driver.secrets.spark=/opt/spark/conf
    --conf spark.kubernetes.executor.secrets.spark=/opt/spark/conf
wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar 10000

I have a k8s secret named spark with the following content:

apiVersion: v1
kind: Secret
metadata:
  name: spark
  labels:
    app: spark
    stack: service
type: Opaque
data:
  core-site.xml: |-
    {% filter b64encode %}
    <configuration>
        <property>
            <name>fs.azure.account.key.<my-account-name>.blob.core.windows.net</name>
            <value><my-account-key></value>
        </property>
        <property>
            <name>fs.AbstractFileSystem.wasb.Impl</name>
            <value>org.apache.hadoop.fs.azure.Wasb</value>
        </property>
    </configuration>
    {% endfilter %}

The driver pod manages to download the jar dependencies as stored in a container in Azure Blob Storage. As can be seen in this log snippet:

2018-10-18 08:48:16 INFO  Utils:54 - Fetching wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar to /var/spark-data/spark-jars/fetchFileTemp8575879929413871510.tmp
2018-10-18 08:48:16 INFO  SparkPodInitContainer:54 - Finished downloading application dependencies.

How can I get the executor pods to get the credentials as stored in the core-site.xml file that's mounted from the k8s secret? What am I missing?

-- Oscar Bonilla
apache-spark
azure
azure-blob-storage
kubernetes

1 Answer

10/23/2018

I solved it by adding the following config to spark-submit

--conf spark.hadoop.fs.AbstractFileSystem.wasb.Impl=org.apache.hadoop.fs.azure.Wasb
--conf spark.hadoop.fs.azure.account.key.${STORAGE_ACCOUNT_NAME}.blob.core.windows.net=${STORAGE_ACCOUNT_KEY}
-- Oscar Bonilla
Source: StackOverflow