I am having a problem where I am trying to restrict a deployment to <strike>work on</strike> avoid a specific node pool and nodeAffinity and nodeAntiAffinity don't seem to be working.
For whatever reason, it seems like no matter what configuration I use, Kubernetes seems to schedule randomly across both node pools.
See configuration below, and the results of scheduling
deployment.yaml snippet
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
namespace: "test"
labels:
app: wordpress
client: "test"
product: hosted-wordpress
version: v1
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
app: wordpress
client: "test"
template:
metadata:
labels:
app: wordpress
client: "test"
product: hosted-wordpress
version: v1
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: doks.digitalocean.com/node-pool
operator: NotIn
values:
- infra
node description snippet note the label, 'doks.digitalocean.com/node-pool=infra'
kubectl describe node infra-3dmga
Name: infra-3dmga
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=s-2vcpu-4gb
beta.kubernetes.io/os=linux
doks.digitalocean.com/node-id=67d84a52-8d08-4b19-87fe-1d837ba46eb6
doks.digitalocean.com/node-pool=infra
doks.digitalocean.com/node-pool-id=2e0f2a1d-fbfa-47e9-9136-c897e51c014a
doks.digitalocean.com/version=1.19.3-do.2
failure-domain.beta.kubernetes.io/region=tor1
kubernetes.io/arch=amd64
kubernetes.io/hostname=infra-3dmga
kubernetes.io/os=linux
node.kubernetes.io/instance-type=s-2vcpu-4gb
region=tor1
topology.kubernetes.io/region=tor1
Annotations: alpha.kubernetes.io/provided-node-ip: 10.137.0.230
csi.volume.kubernetes.io/nodeid: {"dobs.csi.digitalocean.com":"222551559"}
io.cilium.network.ipv4-cilium-host: 10.244.0.139
io.cilium.network.ipv4-health-ip: 10.244.0.209
io.cilium.network.ipv4-pod-cidr: 10.244.0.128/25
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 20 Dec 2020 20:17:20 -0800
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: infra-3dmga
AcquireTime: <unset>
RenewTime: Fri, 12 Feb 2021 08:04:09 -0800
sometimes results in
kubectl get po -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
wordpress-5bfcb6f44b-2j7kv 5/5 Running 0 1h 10.244.0.107 infra-3dmga <none> <none>
other times results in
kubectl get po -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
wordpress-5bfcb6f44b-b42wj 5/5 Running 0 5m 10.244.0.107 clients-3dmem <none> <none>
I have tried using nodeAntiAffinity to similar effect.
And lastly, I have even tried creating test labels instead of using the built-in labels from Digital Ocean and I get the same affect (Affinity just doesn't seem to be working for me at all).
I am hoping that someone can help me resolve or even point out a silly mistake in my config, because this issue has been driving me nuts trying to solve it (and it also is a useful feature, when it works).
Thank you,
In the deployment file, you have mentioned operator: NotIn
which working as anti-affinity.
Please use operator: In
to achieve node affinity. So for instance, if we want pods to use node which has clients
labels.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
namespace: "test"
labels:
app: wordpress
client: "test"
product: hosted-wordpress
version: v1
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
app: wordpress
client: "test"
template:
metadata:
labels:
app: wordpress
client: "test"
product: hosted-wordpress
version: v1
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "doks.digitalocean.com/node-pool"
operator: In
values: ["clients"] ##Pls use correct label
Great news!
I have finally resolved this issue.
The problem was "user error" of course.
There was an extra Spec
line further down in the config that was very hidden.
Originally, before switching to StatefulSets, we were using Deployments, and I had a pod Spec hostname entry which was overriding the Spec
at the top of the file.
Thanks @WytrzymaĆyWiktor and @Manjul for the suggestions!