How to roll kubernetes updates in intervals

1/31/2017

We have a case where we need to make sure that pods in k8s have the latest version possible. What is the best way to accomplish this?

First idea was to kill the pod after some point, knowing that the new ones will come up pulling the latest image. Here is what we found so far. Still don't know how to do it.

Another idea is having rolling-update executed in intervals, like every 5 hours. Is there a way to do this?

-- anvarik
kubernetes
kubernetes-health-check

4 Answers

1/28/2018

I tried using Pagid's solution, but unfortunately my observation and subsequent research indictate that his assertion that a failing container will restart the whole pod is incorrect. It turns out that only the failing container will be restarted, which obviously does not help much when the point is to restart the other containers in the pod at random intervals.

The good news is that I have a solution that seems to work which is based on his answer. Basically, instead of writing to /tmp/healthy, you instead write to a shared volume which each of the containers within the pod have mounted. You also need to add the liveness probe to each of those pods. Here's an example based on the one I am using:

  volumes:
  - name: healthcheck
    emptyDir:
      medium: Memory
  containers:
    - image: alpine:latest
      volumeMounts:
        - mountPath: /healthcheck
          name: healthcheck
      name: alpine
      livenessProbe:
        exec:
          command:
          - cat
          - /healthcheck/healthy
        initialDelaySeconds: 5
        periodSeconds: 5
    - name: liveness
      args:
      - /bin/sh
      - -c
      - touch /healthcheck/healthy; sleep $(( RANDOM % (3600) + 1800 )); rm -rf /healthcheck/healthy; sleep 600
      image: gcr.io/google_containers/busybox
      volumeMounts:
        - mountPath: /healthcheck
          name: healthcheck
      livenessProbe:
        exec:
          command:
          - cat
          - /healthcheck/healthy
        initialDelaySeconds: 5
        periodSeconds: 5
-- Jake Feasel
Source: StackOverflow

1/31/2017

As mentioned by @svenwltr using activeDeadlineSeconds is an easy option but comes with the risk of loosing all pods at once. To mitigate that risk I'd use a deployment to manage the pods and their rollout, and configure a small second container along with the actual application. The small helper could be configured like this (following the official docs):

apiVersion: v1
kind: Pod
metadata:
  name: app-liveness
spec:
  containers:
  - name: liveness
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep $(( RANDOM % (3600) + 1800 )); rm -rf /tmp/healthy; sleep 600
    image: gcr.io/google_containers/busybox

    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

  - name: yourapplication
    imagePullPolicy: Always
    image: nginx:alpine

With this configuration every pod would break randomly within the configured timeframe (here between 30 and 90mins) and that would trigger the start of a new pod. The imagePullPolicy: Always would then make sure that the image is updated during that cycle.

This of course assumes that your application versions are always available under the same name/tag.

-- pagid
Source: StackOverflow

1/31/2017

Another alternative is to use a deployment and let the controller handle roll outs. To be more specific: If you update the image field in the deployment yaml, it automatically updates every pod. IMO that's the cleanest way, but it has some requirements:

  • You cannot use the latest tag. The assumption is that a container only needs an update, when the image tag changes.
  • If an updated happens, you have to update image tag manually, somehow. This might be done by a custom controller which checks for new tags and updates the deployment accordingly. Or this could be triggered by a Continuous Delivery system.
-- svenwltr
Source: StackOverflow

1/31/2017

To use your linked feature you just have to specify activeDeadlineSeconds in your pods.

Not tested example:

apiVersion: v1
kind: Pod
metadata:
  name: "nginx"
spec:
  activeDeadlineSeconds: 3600
  containers:
  - name: nginx
    image: nginx:alpine
    imagePullPolicy: Always

The downside of this is, that you cannot control when the deadline kicks in. This means it might happen, that all your pods get killed at the same time and the whole service gets offline (that depends on you applications).

-- svenwltr
Source: StackOverflow