Working with big data on Dask Kubernetes in Azure Kubernetes Service(AKS)

6/2/2019

I want do analysis on a dataset(like csv file) of 8gb which is in my laptop hard disk. I have already setup a dask kubernetes cluster on AKS with 1 scheduler and 3 worker with 7 gb each.

How can I work on my dataset using this dask kubernetes cluster on AKS? Which file system to share dataset between worker will be best for this purpose?

Any suggestion where I should store this dataset so that I can work on this dataset easily.

The method should work from both a jupyter notebook and from a python file also.

-- dev
azure-aks
dask
dask-kubernetes
dataset
kubernetes

1 Answer

6/2/2019

You would probably want to upload your data to an Azure blob store. There is more information about dask remote data (including Azure) here:

https://docs.dask.org/en/latest/remote-data-services.html

-- MRocklin
Source: StackOverflow