How to set ephemeral-storage in Spark with Kubernetes

11/30/2018

While running a spark job with a Kubernetes cluster, we get the following error:

2018-11-30 14:00:47 INFO  DAGScheduler:54 - Resubmitted ShuffleMapTask(1, 58), so marking it as still running.
2018-11-30 14:00:47 WARN  TaskSetManager:66 - Lost task 310.0 in stage 1.0 (TID 311, 10.233.71.29, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: 
The executor with id 3 exited with exit code -1.
The API gave the following brief reason: Evicted
The API gave the following message: The node was low on resource: ephemeral-storage. Container executor was using 515228Ki, which exceeds its request of 0. 
The API gave the following container statuses:

How to configure the job so we can increase the ephemeral storage size of each container ?

We use spark 2.4.0 and Kubernetes 1.12.1

The spark submit option is as follow

--conf spark.local.dir=/mnt/tmp \
--conf spark.executor.instances=4 \
--conf spark.executor.cores=8 \
--conf spark.executor.memory=100g \
--conf spark.driver.memory=4g \
--conf spark.driver.cores=1 \
--conf spark.kubernetes.memoryOverheadFactor=0.1 \
--conf spark.kubernetes.container.image=spark:2.4.0 \
--conf spark.kubernetes.namespace=visionlab \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.myvolume.options.claimName=pvc \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.myvolume.mount.path=/mnt/ \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.myvolume.mount.readOnly=false \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.myvolume.options.claimName=pvc \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.myvolume.mount.path=/mnt/ \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.myvolume.mount.readOnly=false
-- jrabary
apache-spark
kubernetes

2 Answers

12/1/2018

Looks like your job might be requesting 0 in the pod on the ephemeral-storage. If you look at the docs you'll see that the ephemeral storage is part of the root disk on your nodes. So you can try specifying a hostPath mount instead.

My guess is that is something is happening with the PVC and the container is using an ephemeral-volume, or it could be that you need both a hostPath and a PVC (for /mnt/tmp) if you are specifying volumes. I would check:

$ kubectl describe pvc
$ kubectl describe pv

There is no option in the Spark Driver to issue a Kubernetes Request for limits on ephemeral storage as of this writing.

-- Rico
Source: StackOverflow

5/22/2019

As @Rico says, there's no way to set ephemeral storage limits via driver configurations as of spark 2.4.3. Instead, you can set ephemeral storage limits for all new pods in your namespace using a LimitRange:

apiVersion: v1
kind: LimitRange
metadata:
  name: ephemeral-storage-limit-range
spec:
  limits:
  - default:
      ephemeral-storage: 8Gi
    defaultRequest:
      ephemeral-storage: 1Gi
    type: Container

This applies the defaults to executors created in the LimitRange's namespace:

$ kubectl get pod spark-kub-1558558662866-exec-67 -o json | jq '.spec.containers[0].resources.requests."ephemeral-storage"'
"1Gi"

It's a little heavy-handed because it applies the default to all containers in your namespace, but it may be a solution if your workload is uniform.

-- Shahin
Source: StackOverflow