Apache Spark on Kubernetes (Docker for Mac) is not able resolve my VPN hosts

1/26/2020

I am trying to submit a job via Kubernetes. Went through https://spark.apache.org/docs/latest/running-on-kubernetes.html and successfully submit a job via below command:

$ bin/spark-submit \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=<spark-image> \
local:///path/to/examples.jar

Now, I am trying to submit my job which involves Kafka & PostgreSQL access of which is available over the VPN.

The job works on my local via IntelliJ but the same job fails when I try to submit to Kubernetes.

Exception is

Caused by: java.net.UnknownHostException: db-host-name

How can I resolve DNS name over the VPN?

enter image description here IP is xx

-- JDev
apache-spark
apache-spark-sql
docker
kubernetes

2 Answers

1/26/2020

If you can trying configuring the dns options for the docker image. Either of these two options have worked for DNS/VPN issues I've experienced in the past:

  • --dns=<IP_ADDRESS>
  • --dns-search=<DOMAIN>

Here's more detailed docs

-- CheeseFerret
Source: StackOverflow

1/29/2020

One of the solutions is to create a service that will get a DNS response and route you to the external endpoint that is your database.

Steps:

  • Create a service without selector
  • Create an endpoint manually
  • Test it

Create a service without selector

Create a service without selector by following this: link.

apiVersion: v1
kind: Service
metadata:
  name: DB_NAME
spec:
  ports:
    - protocol: TCP
      port: DB_PORT
      targetPort: DB_PORT

Replace the DB_NAME and DB_PORT with appropriate to your case.

Apply it by running below command:

$ kubectl apply -f FILE_NAME.yaml

Create an endpoint

Service created earlier will not have an endpoint to direct traffic to.

Endpoint below will be used when calling the service name.

apiVersion: v1
kind: Endpoints
metadata:
  name: DB_NAME
subsets:
  - addresses:
      - ip: DB_IP_ADDRESS
    ports:
      - port: DB_PORT

Make sure that the name of the endpoint is the same as used in a service above.

subsets:
  - addresses:
      - ip: DB_IP_ADDRESS
    ports:
      - port: DB_PORT

Take a specific look at above part and replace DB_IP_ADDRESS and DB_PORT.

Apply it by running below command:

$ kubectl apply -f FILE_NAME.yaml

Test it

Run example pod with curl installed and check if created service responds to your requests:

$ curl DB_NAME:DB_PORT

Alternatives

Take a look at additional resources which are explaining other paths:

-- Dawid Kruk
Source: StackOverflow