Spark on Google Cloud Kubernetes Cluster - keeps Evicting Executors: workers are registered and have sufficient resources

4/18/2020

I have followed the below instructions, except minicube, I have used Google Cloud Platform Kubernetes cluster: (Spark 2.3.2)

https://testdriven.io/blog/deploying-spark-on-kubernetes/

When I submit spark jobs with:

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://spark-master:7077 \
  --executor-memory 471859200 \
  --total-executor-cores 20 \
  --deploy-mode cluster \
  /opt/spark/examples/jars/spark-examples_2.11-2.3.2.jar \
  10

or simply open Spark shell using:

/opt/spark/bin/spark-shell --master spark://spark-master:7077
sc.makeRDD(List(1,2,4,4)).count

I keep on getting below WARN messages:

2020-04-18 21:14:38 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2020-04-18 21:14:53 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

On the Spark UI, I can see all my Worker nodes, that I can easily control via:

kubectl scale deployment spark-worker --replicas 2 (or any other number, works fine)

I see a new running application on Spark UI, which keeps on existing executors. I saw it go up to 309 executors, then I kill the job from Spark UI.

Local mode Runs successfully:

/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[2] /opt/spark/examples/jars/spark-examples_2.11-2.3.2.jar 10

I fire all my spark submits from the Master Kubernetes pod:

kubectl exec -it spark-master-dc7d76bf5-dthvn bash

What am I doing wrong? Please let me know what other system details you want from me. Thanks.

Edit: adding Spark UI screenshot of Executors: enter image description here

Worker log:https://drive.google.com/file/d/1xU07m_OB1BEzJXyJ30WzvA5vcrpVmxYj/view?usp=sharing

Master log: spark on K8 master log

-- sumon c
apache-spark
google-kubernetes-engine
kubernetes
pyspark
spark-submit

0 Answers