I have a Kubernetes deployment that looks something like this (replaced names and other things with '....'):
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
kubernetes.io/change-cause: kubectl replace deployment ....
-f - --record
creationTimestamp: 2016-08-20T03:46:28Z
generation: 8
labels:
app: ....
name: ....
namespace: default
resourceVersion: "369219"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/....
uid: aceb2a9e-6688-11e6-b5fc-42010af000c1
spec:
replicas: 2
selector:
matchLabels:
app: ....
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: ....
spec:
containers:
- image: gcr.io/..../....:0.2.1
imagePullPolicy: IfNotPresent
name: ....
ports:
- containerPort: 8080
protocol: TCP
resources:
requests:
cpu: "0"
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 2
observedGeneration: 8
replicas: 2
updatedReplicas: 2
The problem I'm observing is that Kubernetes places both replicas (in the deployment I've asked for two) on the same node. If that node goes down, I lose both containers and the service goes offline.
What I want Kubernetes to do is to ensure that it doesn't double up containers on the same node where the containers are the same type - this only consumes resources and doesn't provide any redundancy. I've looked through the documentation on deployments, replica sets, nodes etc. but I couldn't find any options that would let me tell Kubernetes to do this.
Is there a way to tell Kubernetes how much redundancy across nodes I want for a container?
EDIT: I'm not sure labels will work; labels constrain where a node will run so that it has access to local resources (SSDs) etc. All I want to do is ensure no downtime if a node goes offline.
Maybe a DaemonSet
will work better. I'm using DaemonStets
with nodeSelector
to run pods on specific nodes and avoid duplication.
If a node goes down, any pods running on it would be restarted automatically on another node.
If you start specifying exactly where you want them to run, then you actually loose the capability of Kubernetes to reschedule them on a different node.
The usual practice therefore is to simply let Kubernetes do its thing.
If however you do have valid requirements to run a pod on a specific node, due to requirements for certain local volume type etc, have a read of:
I think you're looking for the Affinity/Anti-Affinity Selectors.
Affinity is for co-locating pods, so I want my website to try and schedule on the same host as my cache for example. On the other hand, Anti-affinity is the opposite, don't schedule on a host as per a set of rules.
So for what you're doing, I would take a closer look at this two links: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#never-co-located-in-the-same-node
https://kubernetes.io/docs/tutorials/stateful-application/zookeeper/#tolerating-node-failure
If you create a Service for that Deployment, before creating the said Deployment, Kubernetes will spread your pods across nodes. This behavior comes from the Scheduler, it is provided on a best-effort basis, providing that you have enough resources available on both nodes.
From the Kubernetes documentation (Managing Resources):
it’s best to specify the service first, since that will ensure the scheduler can spread the pods associated with the service as they are created by the controller(s), such as Deployment.
Also related: Configuration best practices - Service.
I agree with Antoine Cotten to use a service for your deployment. A service always keeps any service up by creating a new pod if, for some reason, one pod is dying in a certain node. However, if you just want to distribute a deployment among all nodes then you can use pod anti affinity in your pod manifest file. I put an example on my gitlab page that you can also find in Kubernetes Blog. For your convenience, I'm providing the example here as well.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
containers:
- name: nginx
image: gcr.io/google_containers/nginx-slim:0.8
ports:
- containerPort: 80
In this example, each Deployment has a label which is app and the value of this label is nginx. In pod spec, you have podAntiAffinity that will restrict to have two same pods (label app:nginx) in one node. You can also use podAffinity if you would like to place multiple Deployments in one node.