Why does the Dask client say my cluster has more cores and memory than the actual total available?

3/5/2019

I'm trying to understand the relationship among Kubernetes pods and the cores and memory of my cluster nodes when using Dask.

My current setup is as follows:

  • Kubernetes cluster using GCP's Kubernetes Engine
  • Helm package manager to install Dask on the cluster

Each node has 8 cores and 30 gb of ram. I have 5 nodes in my cluster:

cluster info

I then scaled the number of pods to 50 by executing

kubectl scale --replicas 50 deployment/nuanced-armadillo-dask-worker

When I initialize the client in Dask using dask.distributed I see the following

dask distributed client info

What puzzles me is that the client says that there are 400 cores and 1.58 tb of memory in my cluster (see screenshot). I suspect that by default each pod is being allocated 8 cores and 30 gb of memory, but how is this possible given the constraints on the actual number of cores and memory in each node?

-- PollPenn
dask
kubernetes

1 Answer

3/6/2019

If you don't specify a number of cores or memory then every Dask worker tries to take up the entire machine on which it is running.

For the helm package you can specify the number of cores and amount of memory per worker by adding resource limits to your worker pod specification. These are listed in the configuration options of the chart.

-- MRocklin
Source: StackOverflow