I am trying to submit a job via Kubernetes. Went through https://spark.apache.org/docs/latest/running-on-kubernetes.html and successfully submit a job via below command:
$ bin/spark-submit \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=<spark-image> \
local:///path/to/examples.jar
Now, I am trying to submit my job which involves Kafka & PostgreSQL access of which is available over the VPN.
The job works on my local via IntelliJ but the same job fails when I try to submit to Kubernetes.
Exception is
Caused by: java.net.UnknownHostException: db-host-name
How can I resolve DNS name over the VPN?
If you can trying configuring the dns options for the docker image. Either of these two options have worked for DNS/VPN issues I've experienced in the past:
--dns=<IP_ADDRESS>
--dns-search=<DOMAIN>
Here's more detailed docs
One of the solutions is to create a service that will get a DNS response and route you to the external endpoint that is your database.
Steps:
Create a service without selector by following this: link.
apiVersion: v1
kind: Service
metadata:
name: DB_NAME
spec:
ports:
- protocol: TCP
port: DB_PORT
targetPort: DB_PORT
Replace the DB_NAME
and DB_PORT
with appropriate to your case.
Apply it by running below command:
$ kubectl apply -f FILE_NAME.yaml
Service created earlier will not have an endpoint to direct traffic to.
Endpoint below will be used when calling the service name.
apiVersion: v1
kind: Endpoints
metadata:
name: DB_NAME
subsets:
- addresses:
- ip: DB_IP_ADDRESS
ports:
- port: DB_PORT
Make sure that the name
of the endpoint is the same as used in a service above.
subsets:
- addresses:
- ip: DB_IP_ADDRESS
ports:
- port: DB_PORT
Take a specific look at above part and replace DB_IP_ADDRESS
and DB_PORT
.
Apply it by running below command:
$ kubectl apply -f FILE_NAME.yaml
Run example pod with curl
installed and check if created service responds to your requests:
$ curl DB_NAME:DB_PORT
Take a look at additional resources which are explaining other paths: