Trying Dask on AWS

8/31/2018

I am a scientist who is exploring the use of Dask on Amazon Web Services. I have some experience with Dask, but none with AWS. I have a few large custom task graphs to execute, and a few colleagues who may want to do the same if I can show them how. I believe that I should be using Kubernetes with Helm because I fall into the "Try out Dask for the first time on a cloud-based system like Amazon, Google, or Microsoft Azure" category.

  1. I also fall into the "Dynamically create a personal and ephemeral deployment for interactive use" category. Should I be trying native Dask-Kubernetes instead of Helm? It seems simpler, but it's hard to judge the trade-offs.
  2. In either case, how do you provide Dask workers a uniform environment that includes your own Python packages (not on any package index)? The solution I've found suggests that packages need to be on a pip or conda index.

Thanks for any help!

-- jkmacc
amazon-web-services
dask
dask-distributed
kubernetes

1 Answer

9/2/2018

Use Helm or Dask-Kubernetes ?

You can use either. Generally starting with Helm is simpler.

How to include custom packages

You can install custom software using pip or conda. They don't need to be on PyPI or the anaconda default channel. You can point pip or conda to other channels. Here is an example installing software using pip from github

pip install git+https://github.com/username/repository@branch

For small custom files you can also use the Client.upload_file method.

-- MRocklin
Source: StackOverflow