So I have 4 nodes. 1 is System, 1 is Dev, 1 is Qa and 1 is UAT.
My affinity is as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
name: auth
namespace: dev
labels:
app: auth
environment: dev
app-role: api
tier: backend
spec:
replicas: 1
selector:
matchLabels:
app: auth
template:
metadata:
labels:
app: auth
environment: dev
app-role: api
tier: backend
annotations:
build: _{Tag}_
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- auth
topologyKey: kubernetes.io/hostname
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: environment
operator: In
values:
- dev
containers:
- name: companyauth
image: company.azurecr.io/auth:_{Tag}_
imagePullPolicy: Always
env:
- name: ConnectionStrings__DevAuth
value: dev
ports:
- containerPort: 80
imagePullSecrets:
- name: ips
It is my intention to make sure that on my production cluster, which has 3 nodes in 3 different availability zones. That all the pods will be scheduled on a different node/availability zone. However, it appears that if I already have pods scheduled on a node, then when I do a deployment it will not overwrite the pods that already exist.
0/4 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't match node selector.
However, if I remove the podAffinity, it works fine and will overwrite the current node with the new pod from the deployment. What is the correct way to do this to ensure my deployment on my production cluster will always have a pod scheduled on a different node in a different availability zone and also be able to update the existing nodes?
Your node affinity rule mandates that only the Dev node will be considered for scheduling. In combination with your podAntiAffinityRule this means, only one Pod can be scheduled (the one on the Dev node).
To get an even scheduling across nodes, you will have to add additional Dev nodes or remove the nodeAffinity rule.
Your goal can be achieved using only PodAntiAffinity.
I have tested this with my GKE
test cluster, but it should work similar on Azure
.
In your current setup, you have set podAntiAffinity
with nodeAffinity
.
Pod anti-affinity
can prevent the scheduler from locating a new pod on the same node as pods with the same labels if the label selector on the new pod matches the label on the current pod.
In your Deployment
setup, new pods will have labels like:
app: auth
environment: dev
app-role: api
tier: backend
PodAntiAffinity
was configured to not allow deploying new pod, if there is already pod with label: app: auth
.
NodeAffinity
was configured to deploy only on the node with label environment: dev
.
To Sum up, your error:
0/4 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't match node selector.
1 node(s) didn't match pod affinity/anti-affinity
your setup allows to deploy only on the node with label environment: dev
and only one pod with label app: auth
.
As you mention
if I already have pods scheduled on a node, then when I do a deployment it will not overwrite the pods that already exist.
PodAntiAffinity
behavior worked and didn't allow to deploy new pod with label app: auth
as there was already one.
3 node(s) didn't match node selector.
NodeAffinity
allows to deploy pods only on the node with label environment: dev
. Other nodes have probably labels like environment: system
, environment: uat
, environment: qa
which didn't match environment: dev
label thus didn't match node selector
.
Easiest way is to remove NodeAffinity
.
While TolpologyKey
is set to kubernetes.io/hostname
in PodAntiAffinity
it's enough.
The topologyKey uses the default label attached to a node to dynamically filter on the name of the node.
For more information, please check this article.
If you will describe your nodes
and grep
them with kubernetes.io/hostname
you will get unique value:
$ kubectl describe node | grep kubernetes.io/hostname
kubernetes.io/hostname=gke-affinity-default-pool-27d6eabd-vhss
kubernetes.io/hostname=gke-affinity-default-pool-5014ecf7-5tkh
kubernetes.io/hostname=gke-affinity-default-pool-c2afcc97-clg9
apiVersion: apps/v1
kind: Deployment
metadata:
name: auth
labels:
app: auth
environment: dev
app-role: api
tier: backend
spec:
replicas: 3
selector:
matchLabels:
app: auth
template:
metadata:
labels:
app: auth
environment: dev
app-role: api
tier: backend
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- auth
topologyKey: kubernetes.io/hostname
containers:
- name: nginx
image: nginx
imagePullPolicy: Always
ports:
- containerPort: 80
After deploying this YAML.
$ kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
auth-7fccf5f7b8-4dkc4 1/1 Running 0 9s 10.0.1.9 gke-affinity-default-pool-c2afcc97-clg9 <none> <none>
auth-7fccf5f7b8-5qgt4 1/1 Running 0 8s 10.0.2.6 gke-affinity-default-pool-5014ecf7-5tkh <none> <none>
auth-7fccf5f7b8-bdmtw 1/1 Running 0 8s 10.0.0.9 gke-affinity-default-pool-27d6eabd-vhss <none> <none>
If you would increase replicas to 7, no more pods will be deployed. All new pods will stuck in Pending
state, as antiPodAffinity
worked (each node already have pod with label app: dev
).
$ kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
auth-7fccf5f7b8-4299k 0/1 Pending 0 79s <none> <none> <none> <none>
auth-7fccf5f7b8-4dkc4 1/1 Running 0 2m1s 10.0.1.9 gke-affinity-default-pool-c2afcc97-clg9 <none> <none>
auth-7fccf5f7b8-556h5 0/1 Pending 0 78s <none> <none> <none> <none>
auth-7fccf5f7b8-5qgt4 1/1 Running 0 2m 10.0.2.6 gke-affinity-default-pool-5014ecf7-5tkh <none> <none>
auth-7fccf5f7b8-bdmtw 1/1 Running 0 2m 10.0.0.9 gke-affinity-default-pool-27d6eabd-vhss <none> <none>
auth-7fccf5f7b8-q4s2c 0/1 Pending 0 79s <none> <none> <none> <none>
auth-7fccf5f7b8-twb9j 0/1 Pending 0 79s <none> <none> <none> <none>
Similar solution was described in High-Availability Deployment of Pods on Multi-Zone Worker Nodes blog.