Pods not replaced if MaxUnavailable set to 0 in Kubernetes

1/3/2019

I want rollback deployment for my pods. I'm updating my pod using set Image in a CI environment. When I set maxUnavailable on Deployment/web file to 1, I get downtime. but when I set maxUnavailable to 0, The pods doesnot get replaced and container / app is not restarted.

Also I Have a single Node in Kubernetes cluster and Here's its info

    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      CPU Requests  CPU Limits  Memory Requests  Memory Limits
      ------------  ----------  ---------------  -------------
      881m (93%)    396m (42%)  909712Ki (33%)   1524112Ki (56%)
    Events:         <none>

Here's the complete YAML file. I do have readiness Probe set.

            apiVersion: extensions/v1beta1
            kind: Deployment
            metadata:
              annotations:
                deployment.kubernetes.io/revision: "10"
                kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe
                  convert
                kompose.version: 1.14.0 (fa706f2)
                kubectl.kubernetes.io/last-applied-configuration: |
                  {"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{"kompose.cmd":"C:\\ProgramData\\chocolatey\\lib\\kubernetes-kompose\\tools\\kompose.exe convert","kompose.version":"1.14.0 (fa706f2)"},"creationTimestamp":null,"labels":{"io.kompose.service":"dev-web"},"name":"dev-web","namespace":"default"},"spec":{"replicas":1,"strategy":{},"template":{"metadata":{"labels":{"io.kompose.service":"dev-web"}},"spec":{"containers":[{"env":[{"name":"JWT_KEY","value":"ABCD"},{"name":"PORT","value":"2000"},{"name":"GOOGLE_APPLICATION_CREDENTIALS","value":"serviceaccount/quick-pay.json"},{"name":"mongoCon","value":"mongodb://quickpayadmin:quickpay1234@ds121343.mlab.com:21343/quick-pay-db"},{"name":"PGHost","value":"173.255.206.177"},{"name":"PGUser","value":"postgres"},{"name":"PGDatabase","value":"quickpay"},{"name":"PGPassword","value":"z33shan"},{"name":"PGPort","value":"5432"}],"image":"gcr.io/quick-pay-208307/quickpay-dev-node:latest","imagePullPolicy":"Always","name":"dev-web-container","ports":[{"containerPort":2000}],"readinessProbe":{"failureThreshold":3,"httpGet":{"path":"/","port":2000,"scheme":"HTTP"},"initialDelaySeconds":5,"periodSeconds":5,"successThreshold":1,"timeoutSeconds":1},"resources":{"requests":{"cpu":"20m"}}}]}}}}
              creationTimestamp: 2018-12-24T12:13:48Z
              generation: 12
              labels:
                io.kompose.service: dev-web
              name: dev-web
              namespace: default
              resourceVersion: "9631122"
              selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/web
              uid: 5e66f7b3-0775-11e9-9653-42010a80019d
            spec:
              progressDeadlineSeconds: 600
              replicas: 2
              revisionHistoryLimit: 10
              selector:
                matchLabels:
                  io.kompose.service: web
              strategy:
                rollingUpdate:
                  maxSurge: 1
                  maxUnavailable: 0
                type: RollingUpdate
              template:
                metadata:
                  creationTimestamp: null
                  labels:
                    io.kompose.service: web
                spec:
                  containers:
                  - env:
                    - name: PORT
                      value: "2000"

                    image: gcr.io/myimagepath/web-node
                    imagePullPolicy: Always
                    name: web-container
                    ports:
                    - containerPort: 2000
                      protocol: TCP
                    readinessProbe:
                      failureThreshold: 10
                      httpGet:
                        path: /
                        port: 2000
                        scheme: HTTP
                      initialDelaySeconds: 10
                      periodSeconds: 10
                      successThreshold: 1
                      timeoutSeconds: 10
                    resources:
                      requests:
                        cpu: 10m
                    terminationMessagePath: /dev/termination-log
                    terminationMessagePolicy: File
                  dnsPolicy: ClusterFirst
                  restartPolicy: Always
                  schedulerName: default-scheduler
                  securityContext: {}
                  terminationGracePeriodSeconds: 30
            status:
              availableReplicas: 2
              conditions:
              - lastTransitionTime: 2019-01-03T05:49:46Z
                lastUpdateTime: 2019-01-03T05:49:46Z
                message: Deployment has minimum availability.
                reason: MinimumReplicasAvailable
                status: "True"
                type: Available
              - lastTransitionTime: 2018-12-24T12:13:48Z
                lastUpdateTime: 2019-01-03T06:04:24Z
                message: ReplicaSet "dev-web-7bd498fc74" has successfully progressed.
                reason: NewReplicaSetAvailable
                status: "True"
                type: Progressing
              observedGeneration: 12
              readyReplicas: 2
              replicas: 2
              updatedReplicas: 2

I've tried with 1 replica and it still doesnot work.

-- Alamgir Qazi
deployment
kubernetes

1 Answer

1/3/2019

In first scenario, the kubernetes deletes one pod (maxUnavailable: 1) and started the pod with new image and waits for ~110 seconds(based on your readiness probe) to check if new pod is able to serve request. New pod doesn't able to serve reques but the pod is in running state and hence it delete the second old pod and started it with new image and again second pod waits for the readiness probe to complete. This is the reason there is some time in between where both the containers are not ready to serve request and hence the downtime.

In second senario, where you have maxUnavailable:0, the kubernetes first brings up the pod with new image and it doesn't able to serve the request in ~110 seconds(based on your readiness probe) and hence it timeouts and it delete the new pod with new image. Same happnes with second pod. Hence your both pod doesn't get updated

So the reason is that you are not giving the enough time to your application to come up and start serving request. Hence the issue. Please increase the value of failureThreshold in your readiness probe and maxUnavailable: 0, it will work.

-- Prafull Ladha
Source: StackOverflow