How to automatically stop rolling update when CrashLoopBackOff?

8/31/2018

I use Google Kubernetes Engine and I intentionally put an error in the code. I was hoping the rolling update will stop when it discovers the status is CrashLoopBackOff, but it wasn't.

In this page, they say..

The Deployment controller will stop the bad rollout automatically, and will stop scaling up the new ReplicaSet. This depends on the rollingUpdate parameters (maxUnavailable specifically) that you have specified.

But it's not happening, is it only if the status ImagePullBackOff?

Below is my configuration.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: volume-service
  labels:
    group: volume
    tier: service
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 2
      maxSurge: 2
  template:
    metadata:
      labels:
        group: volume
        tier: service
    spec:
      containers:
      - name: volume-service
        image: gcr.io/example/volume-service:latest

P.S. I already read liveness/readiness probes, but I don't think it can stop a rolling update? or is it?

-- Hana Alaydrus
google-kubernetes-engine
kubernetes

2 Answers

8/31/2018

The explanation you quoted is correct, and it means that the new replicaSet (the one with the error) will not proceed to completion, but it will be stopped in its progression to the maxSurge+maxUnavailable count. And the old replicaSet will be present too.

Here the example I tried with:

spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1

And these are the results:

NAME                                  READY     STATUS             RESTARTS   AGE
pod/volume-service-6bb8dd677f-2xpwn   0/1       ImagePullBackOff   0          42s
pod/volume-service-6bb8dd677f-gcwj6   0/1       ImagePullBackOff   0          42s
pod/volume-service-c98fd8d-kfff2      1/1       Running            0          59s
pod/volume-service-c98fd8d-wcjkz      1/1       Running            0          28m
pod/volume-service-c98fd8d-xvhbm      1/1       Running            0          28m

NAME                                              DESIRED   CURRENT   READY     AGE
replicaset.extensions/volume-service-6bb8dd677f   2         2         0         26m
replicaset.extensions/volume-service-c98fd8d      3         3         3         28m

My new replicaSet will start only 2 new pods (1 slot from the maxUnavailable and 1 slot from the maxSurge).

The old replicaSet will keep running 3 pods (4 - 1 unAvailable).

The two params you set in the rollingUpdate section are the key point, but you can play also with other factors like readinessProbe, livenessProbe, minReadySeconds, progressDeadlineSeconds.

For them, here the reference.

-- Nicola Ben
Source: StackOverflow

8/31/2018

Turns out I just need to set minReadySeconds and it stops the rolling update when the new replicaSet has status CrashLoopBackOff or something like Exited with status code 1. So now the old replicaSet still available and not updated.

Here is the new config.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: volume-service
  labels:
    group: volume
    tier: service
spec:
  replicas: 4
  minReadySeconds: 60
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 2
      maxSurge: 2
  template:
    metadata:
      labels:
        group: volume
        tier: service
    spec:
      containers:
      - name: volume-service
        image: gcr.io/example/volume-service:latest

Thank you for averyone help!

-- Hana Alaydrus
Source: StackOverflow