Error in workers when running Spark on Kubernetes

7/25/2019

I'm running Spark 2.4.1 in client mode on Kubernetes.

I'm trying to submit a task from a pod containing spark that will launch 2 executor pods. The command is the following:

bin/spark-shell \
--master k8s://https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT \
--deploy-mode client \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=$SPARK_IMAGE \
--conf spark.kubernetes.driver.pod.name=$HOSTNAME
--conf spark.kubernetes.executor.podNamePrefix=spark-exec \
--conf spark.ui.port=4040

These executor pods are created but keep failing with the error:

Caused by: java.io.IOException: Failed to connect to spark-57b8f99554-7nd45:4444
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) 
Caused by: java.net.UnknownHostException: spark-57b8f99554-7nd45
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)

It seems that the worker pods can't reach the master node (pod spark-57b8f99554-7nd45) and it should be related with THIS point but I can't figure out how to solve it. Any idea?

-- Maucan
apache-spark
kubernetes
kubernetes-pod
pyspark

1 Answer

3/4/2020

For running Spark with client mode on Kubernetes pods you will need to follow these steps:

  1. Create a headless service like this one:

    apiVersion: v1
    kind: Service
    metadata:
      name: yoursparkapp
    spec:
      clusterIP: "None"
      selector:
        spark-app-selector: yoursparkapp
    ports:
      - name: driver-rpc-port
        protocol: TCP 
        port: 7078
        targetPort: 7078
      - name: blockmanager
        protocol: TCP 
        port: 7079
        targetPort: 7079

Be careful with this section: spark-app-selector: yoursparkapp because it must match the label used for running the pod where spark-submit will be performed.

Install the above service in you cluster with this command: kubectl create -f yoursparkappservice.yml -n your_namespace

  1. Run some pod assigning the above service:

    kubectl run  \
    -n your_namespace -i --tty yoursparkapp \
    --restart=Never \
    --overrides='{ "apiVersion" : "v1", "metadata" : { "annotations" : "labels": { "spark-app-selector" : "yoursparkapp" } } }' \
    --image=your_container:latest -- /bin/bash
    

For labels we are using "spark-app-selector" : "yoursparkapp". In this way, this pod will be using the service created in the first step.

  1. Inside the pod created in the step 2 we can execute a spark-submit:

    spark-submit --master k8s://https://kubernetes_url:443 \
    --deploy-mode client \
    --name yoursparkapp \
    --conf spark.kubernetes.container.image=your_container:latest \
    --conf spark.kubernetes.pyspark.pythonVersion=3 \
    --conf spark.kubernetes.namespace=your_namespace \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.driver.memory=2g  \
    --conf spark.executor.memory=2g \
    --conf spark.submit.deployMode=client \
    --conf spark.executor.cores=3 \
    --conf spark.driver.cores=3 \
    --conf spark.driver.host=yoursparkapp \
    --conf spark.driver.port=7078 \
    --conf spark.kubernetes.driver.pod.name=yoursparkapp  \
     /path/to/your/remote_spark_app.py
-- Gooseman
Source: StackOverflow