I am setting up GPU monitoring on a cluster using a DaemonSet
and NVIDIA DCGM. Obviously it only makes sense to monitor nodes that have a GPU.
I'm trying to use nodeSelector
for this purpose, but the documentation states that:
For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). The most common usage is one key-value pair.
I intended to check if the label beta.kubernetes.io/instance-type
was any of those:
[p3.2xlarge, p3.8xlarge, p3.16xlarge, p2.xlarge, p2.8xlarge, p2.16xlarge, g3.4xlarge, g3.8xlarge, g3.16xlarge]
But I don't see how to make an or
relationship when using nodeSelector
?
Node Affinity was the solution:
spec:
template:
metadata:
labels:
app: dcgm-exporter
annotations:
prometheus.io/scrape: 'true'
description: |
This `DaemonSet` provides GPU metrics in Prometheus format.
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/instance-type
operator: In
values:
- p2.xlarge
- p2.8xlarge
- p2.16xlarge
- p3.2xlarge
- p3.8xlarge
- p3.16xlarge
- g3.4xlarge
- g3.8xlarge
- g3.16xlarge