Error in spark workers when running in Openshift/Kubernetes

8/5/2021

I have brought up standalone cluster using separate pods for running spark-master and spark-worker in Openshift. Spark worker is up and I can see the same in spark master UI. When trying to submit a job (apache beam pipeline with MinimumWordCount example) using spark-submit from another pod within same namespace, I get the following error in worker node.

Failed to connect to spark-submit:39046

My spark-submit node is spark-submit but I am not sure why is the worker trying to connect to spark-submit and what is the port 39046. My understanding is that spark-master drives all the execution and spark-worker should not connect to spark-submit node again. Am I missing anything here?

Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
        at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:65)
        at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:64)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        ... 4 more
Caused by: java.io.IOException: Failed to connect to spark-submit:39046
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
        at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:194)
        at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:190)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: spark-submit
-- ASD
apache-beam
apache-spark
kubernetes
openshift
spark-submit

0 Answers