Kubernetes rollout give 503 error when switching web pods

2/15/2017

I'm running this command:

kubectl set image deployment/www-deployment VERSION_www=newImage

Works fine. But there's a 10 second window where the website is 503, and I'm a perfectionist.

How can I configure kubernetes to wait for the image to be available before switching the ingress?

I'm using the nginx ingress controller from here:

gcr.io/google_containers/nginx-ingress-controller:0.8.3

And this yaml for the web server:

# Service and Deployment
apiVersion: v1
kind: Service
metadata:
  name: www-service
spec:
  ports:
  - name: http-port
    port: 80
    protocol: TCP
    targetPort: http-port
  selector:
    app: www
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: www-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels: 
        app: www
    spec:
      containers:
      - image: myapp/www
        imagePullPolicy: Always
        livenessProbe:
          httpGet:
            path: /healthz
            port: http-port
        name: www
        ports:
        - containerPort: 80
          name: http-port
          protocol: TCP
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/env-volume
          name: config
          readOnly: true
      imagePullSecrets:
      - name: cloud.docker.com-pull
      volumes:
      - name: config
        secret:
          defaultMode: 420
          items:
          - key: www.sh
            mode: 256
            path: env.sh
          secretName: env-secret

The Docker image is based on a node.js server image.

/healthz is a file in the webserver which returns ok I thought that liveness probe would make sure the server was up and ready before switching to the new version.

Thanks in advance!

-- Michael Cole
kubernetes
kubernetes-health-check

1 Answer

2/15/2017

within the Pod lifecycle it's defined that:

The default state of Liveness before the initial delay is Success.

To make sure you don't run into issues better configure the ReadinessProbe for your Pods too and consider to configure .spec.minReadySeconds for your Deployment.

You'll find details in the Deployment documentation

-- pagid
Source: StackOverflow