Kubernetes deployment fails to perform rolling update when image tag changes

9/2/2016

[Orignally raised as a github issue, but I was asked to post it here]

I'm using GKE (Kubernetes 1.2), and most of the time everything works smoothly when performing a rolling update via a deployment config change.

Occasionally things don't.

When they don't, the deployment is correctly upgraded, but no corresponding RS is created and so the old pod (with a mismatched image to the deployment) lives on.

Reproducing

I'm using circle ci to perform a kubectl apply -f with the only thing changed in the new configuration being the image, to make it point to a new tag.

Broken state

Deployment

$ kubectl get deployments ployst-ui-live-deployment -o=yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "95"
    kubectl.kubernetes.io/last-applied-configuration: '{"kind":"Deployment","apiVersion":"extensions/v1beta1","metadata":{"name":"ployst-ui-live-deployment","creationTimestamp":null},"spec":{"replicas":1,"template":{"metadata":{"creationTimestamp":null,"labels":{"app":"ployst-ui","mode":"live"}},"spec":{"containers":[{"name":"ployst-ui","image":"eu.gcr.io/ployst-proto/ployst-ui:1.1445","ports":[{"containerPort":80}],"resources":{},"imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","dnsPolicy":"ClusterFirst"}},"strategy":{"type":"RollingUpdate"}},"status":{}}'
  creationTimestamp: 2016-04-02T09:34:35Z
  generation: 202
  labels:
    app: ployst-ui
    mode: live
  name: ployst-ui-live-deployment
  namespace: default
  resourceVersion: "7467578"
  selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/ployst-ui-live-deployment
  uid: 1c9fc5a2-f8b6-11e5-ae8f-42010af0000a
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ployst-ui
      mode: live
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ployst-ui
        mode: live
    spec:
      containers:
      - image: eu.gcr.io/ployst-proto/ployst-ui:1.1445
        imagePullPolicy: IfNotPresent
        name: ployst-ui
        ports:
        - containerPort: 80
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  observedGeneration: 200
  replicas: 1
  updatedReplicas: 1

Pod

This pod should have been upgraded to

$ kubectl get pods ployst-ui-live-deployment-1482854615-w6hsg -o=yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"ployst-ui-live-deployment-1482854615","uid":"ba2714da-5343-11e6-8d11-42010af0000a","apiVersion":"extensions","resourceVersion":"7392451"}}
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
      ployst-ui; cpu limit for container ployst-ui'
  creationTimestamp: 2016-07-26T15:15:03Z
  generateName: ployst-ui-live-deployment-1482854615-
  labels:
    app: ployst-ui
    mode: live
    pod-template-hash: "1482854615"
  ...
spec:
  containers:
  - image: eu.gcr.io/ployst-proto/ployst-ui:1.1422
    imagePullPolicy: IfNotPresent
    name: ployst-ui
    ...

The deployment seems fine, it's just that it seems to think it has some matching pods.

Does it not detect image change tags? If not, why does it do it sometimes?

I'm trying to find the scheduler logs - perhaps they're not available to me on GKE?

UPDATE

Here is the today output of kubectl describe deployment ployst-ui-live-deployment (as requested)

Name:           ployst-ui-live-deployment
Namespace:          default
CreationTimestamp:      Sat, 02 Apr 2016 09:34:35 +0000
Labels:         app=ployst-ui,mode=live
Selector:           app=ployst-ui,mode=live
Replicas:           1 updated | 1 total | 1 available | 0 unavailable
StrategyType:       RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 1 max surge
OldReplicaSets:     ployst-ui-live-deployment-935839453 (1/1 replicas created)
NewReplicaSet:      <none>
No events.

The old replica set here points to a much newer one, as many deployments have happened since this issue arose.

-- Alex Couper
deployment
google-cloud-platform
google-kubernetes-engine
kubernetes

0 Answers