Noob here. I want to have a Dask install with a worker pool that can grow and shrink based on current demands. I followed the instructions in zero to jupyterhub to install on GKE, and then went through the install instructions for dask-kubernetes: https://kubernetes.dask.org/en/latest/.
I originally ran into some permissions issues, so I created a service account with all permissions and changed my config.yaml to use this service account. That got rid of the permissions issues, but now when I run this script, with the default worker-spec.yml, I get no workers:
cluster = KubeCluster.from_yaml('worker-spec.yml')
cluster.scale_up(4) # specify number of nodes explicitly
client = distributed.Client(cluster)
client
Cluster
Workers: 0
Cores: 0
Memory: 0 B
When I list my pods, I see a lot of workers in the pending state:
patrick_mineault@cloudshell:~ (neuron-264716)$ kubectl get pod --namespace jhub
NAME READY STATUS RESTARTS AGE
dask-jovyan-24034fcc-22qw7w 0/1 Pending 0 45m
dask-jovyan-24034fcc-25h89q 0/1 Pending 0 45m
dask-jovyan-24034fcc-2bpt25 0/1 Pending 0 45m
dask-jovyan-24034fcc-2dthg6 0/1 Pending 0 45m
dask-jovyan-25b11132-52rn6k 0/1 Pending 0 26m
...
And when I describe each pod, I see that there's an insufficient memory, cpu error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 69s (x22 over 30m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.
Do I need to manually create a new autoscaling pool in GKE or something? I only have one pool now, the one which runs jupyterlab, and that pool is already fully committed. I can't figure out what piece of configuration causes dask to figure out in which pool to put the workers.
I indeed needed to create a flexible, scalable worker pool to host the workers - there's an example of this in the Pangeo setup guide: https://github.com/pangeo-data/pangeo/blob/master/gce/setup-guide/1_create_cluster.sh. This is the relevant line:
gcloud container node-pools create worker-pool --zone=$ZONE --cluster=$CLUSTER_NAME \
--machine-type=$WORKER_MACHINE_TYPE --preemptible --num-nodes=$MIN_WORKER_NODES