I am attempting to deploy an adaptive dask kubernetes cluster to my aws K8s instance (I want to use the kubeControl interface found here). It is unclear to me where and how I execute this code such that it is active on my existing cluster. In addition to this, I want to have an ingress rule such that another ec2 instance I have can connect to the cluster and execute code within an aws VPC to maintain security and network performance.
So far I have managed to get a functional k8s cluster running with dask and jupyterhub running on it. I am using the sample helm chart found here which reference the docker image here. I can see this image does not even install dask-kubernetes. With that being said, I am able to connect to this cluster from my other ec2 instance using the exposed AWS dns server and execute custom code but this is not the kubernetes native dask cluster.
I have worked on modifying the deploy yaml for kubernetes but it is unclear to me what I would need to change to have it use the proper kubernetes cluster/schedulers. I do know I need to modify the docker image I am using to have in install dask-kubernetes, but this still does not help me. Below is the sample helm deploy chart I am using
---
# nameOverride: dask
# fullnameOverride: dask
scheduler:
name: scheduler
image:
repository: "daskdev/dask"
tag: 2.3.0
pullPolicy: IfNotPresent
# See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
pullSecrets:
# - name: regcred
replicas: 1
# serviceType: "ClusterIP"
# serviceType: "NodePort"
serviceType: "LoadBalancer"
servicePort: 8786
resources: {}
# limits:
# cpu: 1.8
# memory: 6G
# requests:
# cpu: 1.8
# memory: 6G
tolerations: []
nodeSelector: {}
affinity: {}
webUI:
name: webui
servicePort: 80
worker:
name: worker
image:
repository: "daskdev/dask"
tag: 2.3.0
pullPolicy: IfNotPresent
# dask_worker: "dask-cuda-worker"
dask_worker: "dask-worker"
pullSecrets:
# - name: regcred
replicas: 3
aptPackages: >-
default_resources: # overwritten by resource limits if they exist
cpu: 1
memory: "4GiB"
env:
# - name: EXTRA_CONDA_PACKAGES
# value: numba xarray -c conda-forge
# - name: EXTRA_PIP_PACKAGES
# value: s3fs dask-ml --upgrade
resources: {}
# limits:
# cpu: 1
# memory: 3G
# nvidia.com/gpu: 1
# requests:
# cpu: 1
# memory: 3G
# nvidia.com/gpu: 1
tolerations: []
nodeSelector: {}
affinity: {}
jupyter:
name: jupyter
enabled: true
image:
repository: "daskdev/dask-notebook"
tag: 2.3.0
pullPolicy: IfNotPresent
pullSecrets:
# - name: regcred
replicas: 1
# serviceType: "ClusterIP"
# serviceType: "NodePort"
serviceType: "LoadBalancer"
servicePort: 80
# This hash corresponds to the password 'dask'
password: 'sha1:aae8550c0a44:9507d45e087d5ee481a5ce9f4f16f37a0867318c'
env:
# - name: EXTRA_CONDA_PACKAGES
# value: "numba xarray -c conda-forge"
# - name: EXTRA_PIP_PACKAGES
# value: "s3fs dask-ml --upgrade"
resources: {}
# limits:
# cpu: 2
# memory: 6G
# requests:
# cpu: 2
# memory: 6G
tolerations: []
nodeSelector: {}
affinity: {}
To run a Dask cluster on Kubernetes there are three recommended approaches. Each of these approaches require you to have an existing Kubernetes cluster and credentials correctly configured (kubectl
works locally).
Dask Helm Chart
You can deploy a standalone Dask cluster using the Dask helm chart.
helm repo add dask https://helm.dask.org/
helm repo update
helm install --name my-release dask/dask
Note that this is not an adaptive cluster but you can scale it by modifying the size of the deployment via kubectl
.
kubectl scale deployment dask-worker --replicas=10
Python dask-kubernetes
API
You can also use dask-kubernetes
which is a Python library for creating ad-hoc clusters on the fly.
pip install dask-kubernetes
from dask_kubernetes import KubeCluster
cluster = KubeCluster()
cluster.scale(10) # specify number of nodes explicitly
cluster.adapt(minimum=1, maximum=100) # or dynamically scale based on current workload
This will create a Dask cluster from scratch and will tear it down when the cluster
object is garbage collected (most likely on exit).
Dask Gateway
Dask Gateway provides a secure, multi-tenant server for managing Dask clusters.
To get started on Kubernetes you need to create a Helm configuration file (config.yaml
) with a gateway proxy token.
gateway:
proxyToken: "<RANDOM TOKEN>"
Hint: You can generate a suitable token with openssl rand -hex 32
.
Then install the chart.
helm repo add dask-gateway https://dask.org/dask-gateway-helm-repo/
helm repo update
helm install --values config.yaml my-release dask-gateway/dask-gateway