I would like to know if there is a way to force Kubernetes, during a deploy, to use every node in the cluster. The question is due some attempts that I have done where I noticed a situation like this:
a cluster of 3 nodes
I update a deployment with a command like: kubectl set image deployment/deployment_name my_repo:v2.1.2
Kubernetes updates the cluster
At the end I execute kubectl get pod
and I notice that 2 pods have been deployed in the same node. So after the update, the cluster has this configuration:
I tried some solutions and what is working at the moment is simply based on the change of version inside my deployment.yaml on DaemonSet controller.
I mean:
1) I have to deploy for the 1' time my application based on a pod with some containers. These pods should be deployed on every cluster node (I have 3 nodes). I have set up the deployment setting in the yaml file with the option replicas
equal to 3:
apiVersion: apps/v1beta2 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
name: my-deployment
labels:
app: webpod
spec:
replicas: 3
....
I have set up the daemonset (or ds) in the yaml file with the option updateStrategy equal to RollingUpdate:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: my-daemonset
spec:
updateStrategy:
type: RollingUpdate
...
The version used for one of my containers is 2.1 for example
2) I execute the deployment with the command: kubectl apply -f my-deployment.yaml
I execute the deployment with the command: kubectl apply -f my-daemonset.yaml
3) I get one pod for every node without problem
4) Now I want to update the deployment changing the version of the image that I use for one of my containers. So I simply change the yaml file editing 2.1 with 2.2. Then I re-launch the command: kubectl apply -f my-deployment.yaml
So I can simply change the version of the image (2.1 -> 2.2) with this command:
kubectl set image ds/my-daemonset my-container=my-repository:v2.2
5) Again, I obtain one pod for every node without problem
Behavior very different if instead I use the command:
kubectl set image deployment/my-deployment my-container=xxxx:v2.2
In this case I get a wrong result where a node has 2 pod, a node 1 pod and last node without any pod...
To see how the deployment evolves, I can launch the command:
kubectl rollout status ds/my-daemonset
getting something like that
Waiting for rollout to finish: 0 out of 3 new pods have been updated...
Waiting for rollout to finish: 0 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 of 3 updated pods are available...
daemon set "my-daemonset" successfully rolled out
The scheduler will try to figure out the most reasonable way of scheduling at given point in time, which can change later on and results in situations like you described. Two simple ways to manage this in one way or another are :
use PodAntiAffinity : you can make sure that two pods of the same deployment in the same version are never deployed on the same node. This is what I personally prefer for many apps (unless I want more then one to be scheduled per node). Note that it will be in a bit of trouble if you decide to scale your deployment to more replicas then you have nodes. Example for versioned PodAntiAffinity I use :
metadata:
labels:
app: {{ template "fullname" . }}
version: {{ .Values.image.tag }}
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: ["{{ template "fullname" . }}"]
- key: version
operator: In
values: ["{{ .Values.image.tag }}"]
topologyKey: kubernetes.io/hostname
consider fiddling with Descheduler which is like an evil twin of Kubes Scheduler component which will cause deleting of pods for them tu reschedule differently