Dask - Kubernetes - Tutorial example

2/13/2019

I have just finished the setup for Dask on a Kubernetes cluster using Helm and now that I want to do the basic tutorials on the jupyter notebook, I run into the following error:

error-dask

Also I have tried in another notebook to analyze a 40 GB dataset but it's very slow to run the following commands (I am just importing 40GB from GCS and then making a value_counts on a binary column):

import dask.dataframe as ddf
import gcsfs

fs = gcsfs.GCSFileSystem(project='tme-chrome')

fs.ls('tme-churning')

df = dd.read_csv('gs://tme-churning/*.csv')


df['churning'].value_counts().compute()

Thanks a lot for your help. I seem to be missing something here.

-- Charles Verleyen
dask
dask-distributed
dataframe
google-cloud-platform
kubernetes

1 Answer

2/20/2019

I tried to reproduce this issue using the dask helm chart found here and wasn't able to. These are the steps I took:

1. helm install -n stable-dask stable/dask
2. Go to output Jupyter IP:PORT
3. Run the first few cells in the notebook.

Are you using a different helm chart?

-- Ryan McCormick
Source: StackOverflow