How to deploy spark on Kubeedge?

11/17/2019

I tried to use k8s deployment mode to deploy spark-2.4.3 on Kubeedge 1.1.0 but failed (docker version 19.03.4,k8s version 1.16.1).

SPARK_DRIVER_BIND_ADDRESS=10.4.20.34
SPARK_IMAGE=spark:2.4.3

SPARK_MASTER="k8s://http://127.0.0.1:8080"

CMD=(
    "$SPARK_HOME/bin/spark-submit"
    --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
    --conf "spark.kubernetes.container.image=${SPARK_IMAGE}"
    --conf "spark.executor.instances=1"
    --conf "spark.kubernetes.executor.limit.cores=1"
    --deploy-mode client
    --master ${SPARK_MASTER}
    --name spark-pi
    --class org.apache.spark.examples.SparkPi
    --driver-memory 1G
    --executor-memory 1G
    --num-executors 1
    --executor-cores 1
    file://${PWD}/spark-examples_2.11-2.4.3.jar
)

${CMD[@]}

Node status is normal.

kubectl get nodes
NAME             STATUS   ROLES    AGE    VERSION
edge-node-001    Ready    edge     6d1h   v1.15.3-kubeedge-v1.1.0-beta.0.178+c6a5aa738261e7-dirty
ubuntu-ms-7b89   Ready    master   6d4h   v1.16.1

But I got some errors

19/11/17 21:45:12 INFO k8s.ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
19/11/17 21:45:12 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46571.
19/11/17 21:45:12 INFO netty.NettyBlockTransferService: Server created on 10.4.20.34:46571
19/11/17 21:45:12 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/11/17 21:45:12 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.4.20.34, 46571, None)
19/11/17 21:45:12 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.4.20.34:46571 with 366.3 MB RAM, BlockManagerId(driver, 10.4.20.34, 46571, None)
19/11/17 21:45:12 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.4.20.34, 46571, None)
19/11/17 21:45:12 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.4.20.34, 46571, None)
19/11/17 21:45:12 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@451882b2{/metrics/json,null,AVAILABLE,@Spark}
19/11/17 21:45:42 INFO k8s.KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
19/11/17 21:45:42 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Parents of final stage: List()
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Missing parents: List()
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
19/11/17 21:45:42 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 366.3 MB)
19/11/17 21:45:42 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 366.3 MB)
19/11/17 21:45:42 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.4.20.34:46571 (size: 1256.0 B, free: 366.3 MB)
19/11/17 21:45:42 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
19/11/17 21:45:42 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
19/11/17 21:45:42 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
19/11/17 21:45:57 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:46:12 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:46:27 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:46:42 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:46:57 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/11/17 21:47:12 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Is it possible to deploy spark on Kubeedge in Kubernetes deployment mode? Or I should try standalone deployment mode?

I'm so confused.

-- bito sky
apache-spark
kubernetes
spark-submit

0 Answers