Have a look at the image from https://spark.apache.org/docs/latest/cluster-overview.html.
The spark cluster in running outside kubernetes. But I am going to run the driver program inside kubernetes. The issue is how to let the spark cluster know whether the driver program is.
My kubernetes yaml file:
kind: List
apiVersion: v1
items:
- kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: counter-uat
spec:
replicas: 1
selector:
matchLabels:
name: spark-driver
template:
metadata:
labels:
name: spark-driver
spec:
containers:
- name: counter-uat
image: counter:0.1.0
command: ["/opt/spark/bin/spark-submit", "--class", "Counter", "--master", "spark://spark.uat:7077", "/usr/src/counter.jar"]
- kind: Service
apiVersion: v1
metadata:
name: spark-driver
labels:
name: spark-driver
spec:
type: NodePort
ports:
- name: port
port: 4040
targetPort: 4040
selector:
name: spark-driver
The error is:
Caused by: java.io.IOException: Failed to connect to /172.17.0.8:44117
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:187)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: /172.17.0.8:44117
The spark cluster is trying to reach the driver program which ip is 172.17.0.8. 172.17.0.8 may be an internal ip inside kubernetes.
How to fix the problem? How to fix my yaml file? Thanks
UPDATE
I added the following two parameters: "--conf", "spark.driver.bindAddress=192.168.42.8", "--conf", "spark.driver.host=0.0.0.0".
But from the log, still trying to reach 172.17.0.8, which is the kubernetes internal pod ip.
UPDATE
kind: List
apiVersion: v1
items:
- kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: counter-uat
spec:
replicas: 1
selector:
matchLabels:
name: counter-driver
template:
metadata:
labels:
name: counter-driver
spec:
containers:
- name: counter-uat
image: counter:0.1.0
command: ["/opt/spark/bin/spark-submit", "--class", "Counter", "--master", "spark://spark.uat:7077", "--conf", "spark.driver.bindAddress=192.168.42.8","/usr/src/counter.jar"]
kind: Service
apiVersion: v1
metadata:
name: counter-driver
labels:
name: counter-driver
spec:
type: NodePort
ports:
- name: driverport
port: 42761
targetPort: 42761
nodePort: 30002
selector:
name: counter-driver
Another error:
2017-06-23T20:00:07.487656154Z Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (starting from 31319)! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
Try setting spark.driver.host
or spark.driver.bindAddress
to "spark.uat"
or "spark-driver.uat"
or the actual driver host in Spark itself. This is a common issue with these type of distributed projects where the master tells the client where to connect. If you don't specify spark.driver.host
, it tries to figure out the proper host by itself and just uses the IP it sees. But in this case the IP it sees is an internal Kubernetes IP and might not work properly for the client.
You can also try setting SPARK_PUBLIC_DNS
environment variable. It actually has a more promising description of:
Hostname your Spark program will advertise to other machines.