We have a case where we need to make sure that pods in k8s have the latest version possible. What is the best way to accomplish this?
First idea was to kill the pod after some point, knowing that the new ones will come up pulling the latest image. Here is what we found so far. Still don't know how to do it.
Another idea is having rolling-update
executed in intervals, like every 5 hours. Is there a way to do this?
I tried using Pagid's solution, but unfortunately my observation and subsequent research indictate that his assertion that a failing container will restart the whole pod is incorrect. It turns out that only the failing container will be restarted, which obviously does not help much when the point is to restart the other containers in the pod at random intervals.
The good news is that I have a solution that seems to work which is based on his answer. Basically, instead of writing to /tmp/healthy, you instead write to a shared volume which each of the containers within the pod have mounted. You also need to add the liveness probe to each of those pods. Here's an example based on the one I am using:
volumes:
- name: healthcheck
emptyDir:
medium: Memory
containers:
- image: alpine:latest
volumeMounts:
- mountPath: /healthcheck
name: healthcheck
name: alpine
livenessProbe:
exec:
command:
- cat
- /healthcheck/healthy
initialDelaySeconds: 5
periodSeconds: 5
- name: liveness
args:
- /bin/sh
- -c
- touch /healthcheck/healthy; sleep $(( RANDOM % (3600) + 1800 )); rm -rf /healthcheck/healthy; sleep 600
image: gcr.io/google_containers/busybox
volumeMounts:
- mountPath: /healthcheck
name: healthcheck
livenessProbe:
exec:
command:
- cat
- /healthcheck/healthy
initialDelaySeconds: 5
periodSeconds: 5
As mentioned by @svenwltr using activeDeadlineSeconds
is an easy option but comes with the risk of loosing all pods at once. To mitigate that risk I'd use a deployment
to manage the pods and their rollout, and configure a small second container along with the actual application. The small helper could be configured like this (following the official docs):
apiVersion: v1
kind: Pod
metadata:
name: app-liveness
spec:
containers:
- name: liveness
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep $(( RANDOM % (3600) + 1800 )); rm -rf /tmp/healthy; sleep 600
image: gcr.io/google_containers/busybox
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
- name: yourapplication
imagePullPolicy: Always
image: nginx:alpine
With this configuration every pod would break randomly within the configured timeframe (here between 30 and 90mins) and that would trigger the start of a new pod. The imagePullPolicy: Always
would then make sure that the image is updated during that cycle.
This of course assumes that your application versions are always available under the same name/tag.
Another alternative is to use a deployment
and let the controller handle roll outs. To be more specific: If you update the image
field in the deployment
yaml, it automatically updates every pod. IMO that's the cleanest way, but it has some requirements:
latest
tag. The assumption is that a container only needs an update, when the image tag changes.To use your linked feature you just have to specify activeDeadlineSeconds
in your pods.
Not tested example:
apiVersion: v1
kind: Pod
metadata:
name: "nginx"
spec:
activeDeadlineSeconds: 3600
containers:
- name: nginx
image: nginx:alpine
imagePullPolicy: Always
The downside of this is, that you cannot control when the deadline kicks in. This means it might happen, that all your pods get killed at the same time and the whole service gets offline (that depends on you applications).