StatefulSet: Longer rolling update lead Version mismatching

1/2/2022

Application is deployed on K8s using StatefulSet because of stateful in nature. There is around 250+ pods are running and HPA has been implemented on it too that can scale upto 400 pods.

When new deployment occurs, it takes longer time (~ 10-15m) to update all pods in Rolling Update fashion.

Problem: End user get response from 2 version of pods until all pods are replaced with new revision.

I am googling for an architecture where overall deployment time can be reduced and getting the best possible solutions to use BLUE/GREEN strategy but it has bunch of impact with integrated services like monitoring, logging, telemetry etc because of 2 naming conventions.

Ideally I am looking for a solutions like maxSurge for Deployment in which firstly new pods are created and then traffic are shifted to it at a time but in case of StatefulSet, it won't support maxSurge with RollingUpdate strategy & controller will delete and recreate each Pod in the StatefulSet based on ordinal index from bigger to smaller.

-- Ashish Kumar
amazon-eks
azure-aks
google-kubernetes-engine
kubernetes
kubernetes-helm

1 Answer

1/4/2022

The solution is to do a partitioning rolling update along with a canary deployment.

Let’s suppose we have the statefulset workload defined by the following yaml file:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
    version: "1.20"
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
    version: "1.20"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # Label selector that determines which Pods belong to the StatefulSet
                 # Must match spec: template: metadata: labels
  serviceName: "nginx"
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx # Pod template's label selector
        version: "1.20"
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: nginx:1.20
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

You could patch the statefulset to create a partition, and change the image and version label for the remaining pods: (In this case, since there are only 3 pods, the last one will be the one that will change its image.)

$ kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
$ kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"nginx:1.21"}]'
$ kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/metadata/labels/version", "value":"1.21"}]'

At this point, you have a pod with the new image and version label ready to use, but since the version label is different, the traffic is still going to the other two pods. If you change the version in the yaml file and apply the new configuration, the rollout will be transparent, since there is already a pod ready to migrate the traffic:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
    version: "1.21"
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
    version: "1.21"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # Label selector that determines which Pods belong to the StatefulSet
                 # Must match spec: template: metadata: labels
  serviceName: "nginx"
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx # Pod template's label selector
        version: "1.21"
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
$ kubectl apply -f file-name.yaml

Once traffic is migrated to the pod containing the new image and version label, you should patch again the statefulset and remove the partition with the command kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'

Note: You will need to be very careful with the size of the partition, since the remaining pods will handle the whole traffic for some time.

-- Gabriel Robledo Ahumada
Source: StackOverflow