Issue - Kubernetes Deployment with multiple active ReplicaSets

3/2/2020

I have a Kubernetes deployment which is the Owner/Parent of two ACTIVE ReplicaSets with different configurations.

This setup is managed by Helm.

I have tried to make the revisionHistory: 0. This doesn't work as the ReplicaSet is not inactive. This old ReplicaSet tries to spin up a pod and it stays in pending because of the resource limitations on the node.

I tried to update the Deployment and only the new ReplicaSet is updated. The old one remains the same.

I am not able to delete this ReplicaSet as well. This is causing a lot of troubles for me.

Could somebody help me with this issue?

Helm Deployment Template -

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: example
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: example
    spec:
      serviceAccountName: example
      nodeSelector:
        node-role: example-node
      containers:
      - name: example
        image: example-image:vX.X.X
        resources:
          requests:
            cpu: 100m
        ports:
        - name: example-port
          containerPort: XXXX
        - name: example-port-1
          containerPort: XXXX
        readinessProbe:
          httpGet:
            path: /example
            port: XXXX
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: example-sidecar
        image: example-image-sidecar:vX.X.X
        resources:
          limits:
            memory: 400Mi
          requests:
            cpu: 100m
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - command
          - --container=example
          - --cpu=200m
          - --extra-cpu=10m
          - --memory=300Mi
          - --extra-memory=2Mi
          - --threshold=5
          - --deployment=example

Stack: Standalone K8s deployed on AWS EC2s using Kops and Helm 2.13.1

OutPuts -

kubectl get rs -o wide -n kube-system | grep example

NAME               DESIRED  CURRENT READY   AGE     CONTAINERS IMAGES SELECTOR
example-6d4f99bc54 0        0       0       12h     example,example-sidecar example-image:vX.X.X,example-image-sidecar:vX.X.X k8s-app=example,pod-template-hash=6d4f99bc54
example-59d46955b6 0        0       0       13h     example,example-sidecar example-image:vX.X.X,example-image-sidecar:vX.X.X k8s-app=example,pod-template-hash=59d46955b6
example-5855866cdb 0        0       0       18h     example,example-sidecar example-image:vX.X.X,example-image-sidecar:vX.X.X k8s-app=example,pod-template-hash=5855866cdb
example-ccc5cf5cd0 0        0       0       18h     example,example-sidecar example-image:vX.X.X,example-image-sidecar:vX.X.X k8s-app=example,pod-template-hash=ccc5cf5cd
example-66db79f578 1        1       0       19h     example,example-sidecar example-image:vX.X.X,example-image-sidecar:vX.X.X k8s-app=example,pod-template-hash=66db79f578
example-759469945f 1        1       1       19h     example,example-sidecar example-image:vX.X.X,example-image-sidecar:vX.X.X k8s-app=example,pod-template-hash=759469945f
example-ff8f986960 0        0       0       19h     example,example-sidecar example-image:vX.X.X,example-image-sidecar:vX.X.X k8s-app=example,pod-template-hash=ff8f98696
kubectl describe deployments example -n kube-system

Name:                   example
Namespace:              kube-system
CreationTimestamp:      Tue, 03 Mar 2020 00:48:18 +0530
Labels:                 k8s-app=example
Annotations:            deployment.kubernetes.io/revision: 27
Selector:               k8s-app=example
Replicas:               1 desired | 1 updated | 2 total | 1 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           k8s-app=example
  Service Account:  example
  Containers:
   example:
    Image:       example-image:vX.X.X
    Ports:       8080/TCP, 8081/TCP
    Host Ports:  0/TCP, 0/TCP
    Limits:
      cpu:     1630m
      memory:  586Mi
    Requests:
      cpu:        1630m
      memory:     586Mi
    Readiness:    http-get http://:8080/healthz delay=5s timeout=5s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:       <none>
   example-sidecar:
    Image:      example-image-sidecar:vX.X.X
    Port:       <none>
    Host Port:  <none>
    Command:
      command
      --container=example
      --cpu=200m
      --extra-cpu=10m
      --memory=300Mi
      --extra-memory=2Mi
      --threshold=5
      --deployment=example
    Limits:
      memory:  400Mi
    Requests:
      cpu:  100m
    Environment:
      MY_POD_NAME:        (v1:metadata.name)
      MY_POD_NAMESPACE:   (v1:metadata.namespace)
    Mounts:              <none>
  Volumes:               <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  example-759469945f (1/1 replicas created)
NewReplicaSet:   example-66db79f578 (1/1 replicas created)
Events:          <none>
kubectl rollout history deployments example -n kube-system

deployment.extensions/example 
REVISION  CHANGE-CAUSE
1         <none>
16        <none>
17        <none>
21        <none>
24        <none>
26        <none>
27        <none>
-- Arpit Airan
kubernetes
kubernetes-helm

2 Answers

3/13/2020

In you case it would help to specify Deployment update strategy explicitly.
From your deployment description we can see the options you are getting by default:

StrategyType:           RollingUpdate
RollingUpdateStrategy:  25% max unavailable, 25% max surge

You can see the full YAML using the following command:
kubectl get deployment example -n kube-system -o yaml --export

Note: kube-system namespace is not the best place to put your custom deployments. I would recommend to use a custom namespace created using kubectl create ns namespace-name or default namespace

There are two ways to fix the issue. You should force the deployment to get rid of the old pod(s) before creating new ones:

Way one. Here I've set update type to "Recreate". After being updated, the Deployment kills all pods at once, and starts new version of pods with the same amount of replicas. Some downtime of the service is expected even if the replicas>1.

strategy:
  type: Recreate

Way two: In the next example I've set rolling update options to maxSurge=0 and maxUnavailable=1 After the next update, Deployment will kill one pod as a first step and then creates new version of the pod to keep total replicas count equal to amount set in spec. After the new pod becomes ready, the process will repeat with the next pod. If you have only one replicas some downtime is also expected.

strategy:
  rollingUpdate:
    maxSurge: 0
    maxUnavailable: 1
  type: RollingUpdate

Update type explanation:
k explain deployment.spec.strategy.type

Type of deployment. Can be "Recreate" or "RollingUpdate". Default is
 RollingUpdate.

Options explanations:
kubectl explain deployment.spec.strategy.rollingUpdate

 maxSurge   <string>
   The maximum number of pods that can be scheduled above the desired number
   of pods. Value can be an absolute number (ex: 5) or a percentage of desired
   pods (ex: 10%). This can not be 0 if MaxUnavailable is 0. Absolute number
   is calculated from percentage by rounding up. By default, a value of 1 is
   used. Example: when this is set to 30%, the new RC can be scaled up
   immediately when the rolling update starts, such that the total number of
   old and new pods do not exceed 130% of desired pods. Once old pods have
   been killed, new RC can be scaled up further, ensuring that total number of
   pods running at any time during the update is at most 130% of desired pods.

 maxUnavailable <string>
   The maximum number of pods that can be unavailable during the update. Value
   can be an absolute number (ex: 5) or a percentage of desired pods (ex:
   10%). Absolute number is calculated from percentage by rounding down. This
   can not be 0 if MaxSurge is 0. By default, a fixed value of 1 is used.
   Example: when this is set to 30%, the old RC can be scaled down to 70% of
   desired pods immediately when the rolling update starts. Once new pods are
   ready, old RC can be scaled down further, followed by scaling up the new
   RC, ensuring that the total number of pods available at all times during
   the update is at least 70% of desired pods.
-- VAS
Source: StackOverflow

3/2/2020

Theory

I tried to update the Deployment and only the new ReplicaSet is updated. The old one remains the same.

In this case the issue is that you have 2 different Deployments. One that you are editing (so one rs gets updated) and another one ("old") that was created in some other way.

Normally, you can't delete ReplicaSet easily because it is controlled by another entity.

In Kubernetes it is possible to delete rs in a following way:

  • Find the name for "old" rs with kubectl get replicaset -n kube-system .
  • Find the object that "old" rs is controlled by: kubectl describe <rs-name>
  • Delete the object that's parent to that rs.

Practice

The fact that you oar observing multiple rs'es means that you have been trying updating the Deployment.

kubectl get rs -o wide -n kube-system | grep example

NAME               DESIRED  CURRENT     READY   AGE     
example-6d4f99bc54 0        0       0   12h 
example-59d46955b6 0        0       0   13h 
example-5855866cdb 0        0       0   18h 
example-ccc5cf5cd0 0        0       0   18h 
example-66db79f578 1        1       0   19h     
example-759469945f 1        1       1   19h 
example-ff8f986960 0        0       0   19h 

From that output we can see that the example-759469945f created 19h ago is alive ( DESIRED/CURRENT/READY = 1/1/1 ) . After that there were an attempts to update it, so other rs'es were created one by one by update process.

All these attempts were unsuccessful due to issue with the Deployment (we'll discuss that later).

After a few unsuccessful attempts you rolled back to example-66db79f578 which was created with broken Deployment as well (that is why the latest one, example-6d4f99bc54 has 0/0/0 instead of 1/1/0)

That broken Deployment is the root cause why you have 2 replicasets with CURRENT\=1 (example-759469945f and example-66db79f578 with 1/1/1 and 1/1/0 respectively).

Please note that the RollingUpdate strategy is used in this Deployment.

StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge

That is why the old rs is not decommissioned while new one is not completely "up and running" (has matching values for DESIRED/CURRENT/READY)

You'll end up with only one rs as soon as you fix the Deployment and apply changes .

In order to fix a deployment it is needed to check what went wrong while k8s tried creating a pod for example-66db79f578

You can do it in a following way:

# check the pod name
kubectl get pods -n kube-system | grep 66db79f578

# describe pod. it shall give you the root cause in "Events:" section
kubectl describe pod example-66db79f578-<somehash>

# additionally you can try checking logs for the containers on that pod.

kubectl logs example-66db79f578-<somehash>  example
kubectl logs example-66db79f578-<somehash>  example-sidecar

# fix :)
# apply changes

As soon as you fix the broken Deployment you'll be able to apply it with no issues.

Hope that helps.

-- Nick
Source: StackOverflow