I have a Kubernetes cluster set up using Kubernetes Engine on GCP. I have also installed Dask using the Helm package manager. My data are stored in a Google Storage bucket on GCP.
Running kubectl get services
on my local machine yields the following output
I can open the dashboard and jupyter notebook using the external IP without any problems. However, I'd like to develop a workflow where I write code in my local machine and submit the script to the remote cluster and run it there.
How can I do this?
I tried following the instructions in Submitting Applications using dask-remote
. I also tried exposing the scheduler using kubectl expose deployment
with type LoadBalancer, though I do not know if I did this correctly. Suggestions are greatly appreciated.
Yes, if your client and workers share the same software environment then you should be able to connect a client to a remote scheduler using the publicly visible IP.
from dask.distributed import Client
client = Client('REDACTED_EXTERNAL_SCHEDULER_IP')