Spark operator stopped working with Kubernetes

11/5/2019

Spark Operator was working fine, but suddenly it has stopped working and is giving the following error. There has been no change in kubernetes version or spark-operator version.

Exception in thread "kubernetes-dispatcher-0" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@1ca8f413 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@23fb84b3[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
    at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
    at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
    at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632)
    at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:303)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:48)
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:216)
    at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: 
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
    at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
-- Kite
apache-spark
kubernetes
scala

1 Answer

11/8/2019

Had the same problem (around the 5th of November). It was probably caused by the auto-upgrade of GKE master and node pools (gke release notes).

Changing the kubernetes client version did the trick: I removed $SPARK_HOME/jars/kubernetes-client-4.1.2.jar and replaced it by $SPARK_HOME/jars/kubernetes-client-4.4.2.jar (link to download new client)

Found the solution in: link of issue opened in GoogleCloudPlatform/spark-on-k8s-operator project

-- Tikchbila
Source: StackOverflow