Spark 2.3 submit on Kubernetes error

3/11/2018

Getting below errors when I’m trying to run spark-submit on k8 cluster

Error 1: This looks like a warning it doesn’t interrupt the app running inside executor pod but keeps on getting this warning

2018-03-09 11:15:21 WARN  WatchConnectionManager:192 - Exec Failure
java.io.EOFException
       at okio.RealBufferedSource.require(RealBufferedSource.java:60)
       at okio.RealBufferedSource.readByte(RealBufferedSource.java:73)
       at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:113)
       at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:97)
       at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
       at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
       at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
       at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)

Error2: This is intermittent error which is failing the executor pod to run

org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$createTaskScheduler(SparkContext.scala:2747)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
    at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
    at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
    at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
    at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
    at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver-driver]  in namespace: [default]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$createTaskScheduler(SparkContext.scala:2741)
    ... 11 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
    at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at okhttp3.Dns$1.lookup(Dns.java:39)
    at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
    at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
    at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
    at okhttp3.RealCall.execute(RealCall.java:69)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
    ... 15 more
2018-03-09 15:00:39 INFO  AbstractConnector:318 - Stopped Spark@5f59185e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-03-09 15:00:39 INFO  SparkUI:54 - Stopped Spark web UI at http://myapp-ef79db3d9f4831bf85bda14145fdf113-driver-svc.default.svc:4040
2018-03-09 15:00:39 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-03-09 15:00:39 INFO  MemoryStore:54 - MemoryStore cleared
2018-03-09 15:00:39 INFO  BlockManager:54 - BlockManager stopped
2018-03-09 15:00:39 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-03-09 15:00:39 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
2018-03-09 15:00:39 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-03-09 15:00:39 INFO  SparkContext:54 - Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$createTaskScheduler(SparkContext.scala:2747)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:492)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
    at com.capitalone.quantum.spark.core.QuantumSession$.initialize(QuantumSession.scala:62)
    at com.capitalone.quantum.spark.core.QuantumSession$.getSparkSession(QuantumSession.scala:80)
    at com.capitalone.quantum.workflow.WorkflowApp$.getSession(WorkflowApp.scala:116)
    at com.capitalone.quantum.workflow.WorkflowApp$.main(WorkflowApp.scala:90)
    at com.capitalone.quantum.workflow.WorkflowApp.main(WorkflowApp.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [myapp-ef79db3d9f4831bf85bda14145fdf113-driver]  in namespace: [default]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:228)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:184)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.<init>(KubernetesClusterSchedulerBackend.scala:70)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:120)
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$createTaskScheduler(SparkContext.scala:2741)
    ... 11 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
    at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at okhttp3.Dns$1.lookup(Dns.java:39)
    at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
    at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
    at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
    at okhttp3.RealCall.execute(RealCall.java:69)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:312)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:295)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:783)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:217)
    ... 15 more
2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-03-09 15:00:39 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-5bd85c96-d689-4c53-a0b3-1eadd32357cb

Note:Able to run the application successfully but spark-submit run fails with above error2 very frequently.

-- shiv455
apache-spark
kubernetes

2 Answers

5/10/2019

My ubuntu18.04 PC also encountered this problem, but it has been resolved. The general steps are as follows:

  1. Get the url through the minikube dashboard -url and enter the dashboard. Search for the coredns service to start normally
  2. Check if the coredns service log is caused by Loop ... detected ...
  3. If this problem is caused, vim /etc/resolv.conf, Modify nameserver 127.0.0.53 to nameserver 8.8.8.8.
  4. restart minikube stop && minikube start

If coredns can't be started, you can check it out here: https://github.com/coredns/coredns/tree/master/plugin/loop

-- ideal
Source: StackOverflow

8/9/2018

You need to set ${SPARK_LOCAL_IP} environment variable to pod IP and pass to spark-submit this environment variable using --conf spark.driver.host=${SPARK_LOCAL_IP}.

See the spark docs for more information regarding these variables.

-- Eran Levy
Source: StackOverflow