I've setup a Kubernetes 1.5 cluster with the three master nodes tainted dedicated=master:NoSchedule. Now I want to deploy the Nginx Ingress Controller on the Master nodes only so I've added tolerations:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-ingress-controller
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
spec:
replicas: 3
template:
metadata:
labels:
k8s-app: nginx-ingress-lb
name: nginx-ingress-lb
annotations:
scheduler.alpha.kubernetes.io/tolerations: |
[
{
"key": "dedicated",
"operator": "Equal",
"value": "master",
"effect": "NoSchedule"
}
]
spec:
[…]
Unfortunately this does not have the desired effect: Kubernetes schedules all Pods on the workers. When scaling the number of replicas to a larger number the Pods are deployed on the workers, too.
How can I achieve scheduling to the Master nodes only?
Thanks for your help.
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
A toleration does not mean that the pod must be scheduled on a node with such taints. It means that the pod tolerates such a taint. If you want your pod to be "attracted" to specific nodes you will need to attach a label to your dedicated=master tainted nodes and set nodeSelector in the pod to look for such label.
Attach the label to each of your special use nodes:
kubectl label nodes name_of_your_node dedicated=master
Add the nodeSelector to your pod:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: nginx-ingress-controller
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
spec:
replicas: 3
template:
metadata:
labels:
k8s-app: nginx-ingress-lb
name: nginx-ingress-lb
annotations:
spec:
nodeSelector:
dedicated: master
tolerations:
- key: dedicated
operator: Equal
value: master
effect: NoSchedule
[…]
If you don't fancy nodeSelector
you can add affinity:
under spec:
instead:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
matchExpressions:
- key: dedicated
operator: Equal
values: ["master"]
Add the nodeSelector to your pod:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-ingress-controller
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
spec:
replicas: 3
template:
metadata:
labels:
k8s-app: nginx-ingress-lb
name: nginx-ingress-lb
annotations:
scheduler.alpha.kubernetes.io/tolerations: |
[
{
"key": "dedicated",
"operator": "Equal",
"value": "master",
"effect": "NoSchedule"
}
]
spec:
nodeSelector:
dedicated: master
[…]
If you don't fancy nodeSelector
you can also add an annotation like this:
scheduler.alpha.kubernetes.io/affinity: >
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "dedicated",
"operator": "Equal",
"values": ["master"]
}
]
}
]
}
}
}
Keep in mind that NoSchedule will not evict pods that are already scheduled.
The information above is from https://kubernetes.io/docs/user-guide/node-selection/ and there are more details there.
you might want to dive into the Assigning Pods to Nodes documentation. Basically you should add some labels to your nodes with sth like this:
kubectl label nodes <node-name> <label-key>=<label-value>
and then reference that within your Pod
specification like this:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeSelector:
label: value
But I'm not sure if this works for non-critical addons when the specific node is tainted. More details could be found here