Custom DNS resolver for Google Cloud Dataflow pipeline

3/16/2017

I am trying to access Kafka and 3rd-party services (e.g., InfluxDB) running in GKE, from a Dataflow pipeline.

I have a DNS server for service discovery, also running in GKE. I also have a route in my network to access the GKE IP range from Dataflow instances, and this is working fine. I can manually nslookup from the Dataflow instances using my custom server without issues.

However, I cannot find a proper way to set up an additional DNS server when running my Dataflow pipeline. How could I achieve that, so that KafkaIO and similar sources/writers can resolve hostnames against my custom DNS?

sun.net.spi.nameservice.nameservers is tricky to use, because it must be called very early on, before the name service is statically instantiated. I would call java -D, but Dataflow is going to run the code itself directly.

In addition, I would not want to just replace the systems resolvers but merely append a new one to the GCP project-specific resolvers that the instance comes pre-configured with.

Finally, I have not found any way to use a startup script like for a regular GCE instance with the Dataflow instances.

-- peay
google-cloud-dataflow
google-cloud-platform
google-kubernetes-engine

1 Answer

3/21/2017

I can't think of a way today of specifying a custom DNS in a VM other than editing /etc/resolv.conf[1] file in the box. I don't know if it is possible to share the default network. If it is machines are available at hostName.c.[PROJECT_ID].internal, which may serve your purpose if hostName is stable [2].

[1] https://cloud.google.com/compute/docs/networking#internal_dns_and_resolvconf [2] https://cloud.google.com/compute/docs/networking

-- rf-
Source: StackOverflow