How to submit a Dask job to a remote Kubernetes cluster from local machine

3/7/2019

I have a Kubernetes cluster set up using Kubernetes Engine on GCP. I have also installed Dask using the Helm package manager. My data are stored in a Google Storage bucket on GCP.

Running kubectl get services on my local machine yields the following output

enter image description here

I can open the dashboard and jupyter notebook using the external IP without any problems. However, I'd like to develop a workflow where I write code in my local machine and submit the script to the remote cluster and run it there.

How can I do this?

I tried following the instructions in Submitting Applications using dask-remote. I also tried exposing the scheduler using kubectl expose deployment with type LoadBalancer, though I do not know if I did this correctly. Suggestions are greatly appreciated.

-- PollPenn
dask
dask-distributed
kubernetes

1 Answer

3/16/2019

Yes, if your client and workers share the same software environment then you should be able to connect a client to a remote scheduler using the publicly visible IP.

from dask.distributed import Client
client = Client('REDACTED_EXTERNAL_SCHEDULER_IP')
-- MRocklin
Source: StackOverflow