Does Kubernetes support green-blue deployment?

11/19/2019

I would like to ask on the mechanism for stopping the pods in kubernetes.

I read https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods before ask the question.

Supposably we have a application with gracefully shutdown support (for example we use simple http server on Go https://play.golang.org/p/5tmkPPMiSSt).

Server has two endpoints:

  • /fast, always send 200 http status code.
  • /slow, wait 10 seconds and send 200 http status code.

There is deployment/service resource with that configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app/name: test
  template:
    metadata:
      labels:
        app/name: test
    spec:
      terminationGracePeriodSeconds: 120
      containers:
        - name: service
          image: host.org/images/grace:v0.1
          livenessProbe:
            httpGet:
              path: /health
              port: 10002
            failureThreshold: 1
            initialDelaySeconds: 1
          readinessProbe:
            httpGet:
              path: /health
              port: 10002
            failureThreshold: 1
            initialDelaySeconds: 1
---
apiVersion: v1
kind: Service
metadata:
  name: test
spec:
  type: NodePort
  ports:
    - name: http
      port: 10002
      targetPort: 10002
  selector:
    app/name: test

To make sure the pods deleted gracefully I conducted two test options.


First option (slow endpoint) flow:

  • Create deployment with replicas value equal 1.
  • Wait for pod readness.
  • Send request on /slow endpoint (curl http://ip-of-some-node:nodePort/slow) and delete pod (simultaneously, with 1 second out of sync).

Expected:

Pod must not end before http server completed my request.

Got:

Yes, http server process in 10 seconds and return response for me. (if we pass --grace-period=1 option to kubectl, then curl will write - curl: (52) Empty reply from server)

Everything works as expected.


Second option (fast endpoint) flow:

  • Create deployment with replicas value equal 10.
  • Wait for pods readness.
  • Start wrk with "Connection: close" header.
  • Randomly delete one or two pods (kubectl delete pod/xxx).

Expected:

No socket errors.

Got:

$ wrk -d 2m --header "Connection: Close" http://ip-of-some-node:nodePort/fast
Running 2m test @ http://ip-of-some-node:nodePort/fast
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   122.35ms  177.30ms   1.98s    91.33%
    Req/Sec    66.98     33.93   160.00     65.83%
  15890 requests in 2.00m, 1.83MB read
  Socket errors: connect 0, read 15, write 0, timeout 0
Requests/sec:    132.34
Transfer/sec:     15.64KB

15 socket errors on read, that is, some pods were disconnected from the service before all requests were processed (maybe).

The problem appears when a new deployment version is applied, scale down and rollout undo.

Questions:

  1. What's reason of that behavior?
  2. How to fix it?

Kubernetes version: v1.16.2

Edit 1.

The number of errors changes each time, but remains in the range of 10-20, when removing 2-5 pods in two minutes.

P.S. If we will not delete a pod, we don't got errors.

-- Rokker Ruslan
deployment
kubernetes
networking
sockets

1 Answer

11/22/2019

Does Kubernetes support green-blue deployment?

Yes, it does. You can read about it on Zero-downtime Deployment in Kubernetes with Jenkins,

A blue/green deployment is a change management strategy for releasing software code. Blue/green deployments, which may also be referred to as A/B deployments require two identical hardware environments that are configured exactly the same way. While one environment is active and serving end users, the other environment remains idle.

Container technology offers a stand-alone environment to run the desired service, which makes it super easy to create identical environments as required in the blue/green deployment. The loosely coupled Services - ReplicaSets, and the label/selector-based service routing in Kubernetes make it easy to switch between different backend environments.

I would also recommend reading Kubernetes Infrastructure Blue/Green deployments.

Here is a repository with examples from codefresh.io about blue green deployment.

This repository holds a bash script that allows you to perform blue/green deployments on a Kubernetes cluster. See also the respective blog post

Prerequisites

As a convention the script expects

  1. The name of your deployment to be $APP_NAME-$VERSION
  2. Your deployment should have a label that shows it version
  3. Your service should point to the deployment by using a version selector, pointing to the corresponding label in the deployment

Notice that the new color deployment created by the script will follow the same conventions. This way each subsequent pipeline you run will work in the same manner.

You can see examples of the tags with the sample application:

You might be also interested in Canary deployment:

Another deployment strategy is using Canaries (a.k.a. incremental rollouts). With canaries, the new version of the application is gradually deployed to the Kubernetes cluster while getting a very small amount of live traffic (i.e. a subset of live users are connecting to the new version while the rest are still using the previous version). ...

The small subset of live traffic to the new version acts as an early warning for potential problems that might be present in the new code. As our confidence increases, more canaries are created and more users are now connecting to the updated version. In the end, all live traffic goes to canaries, and thus the canary version becomes the new “production version”.

EDIT

Questions:

  1. What's reason of that behavior?

When new deployment is being applied old pods are being removed and new ones are being scheduled. This is being done by Control Plan

For example, when you use the Kubernetes API to create a Deployment, you provide a new desired state for the system. The Kubernetes Control Plane records that object creation, and carries out your instructions by starting the required applications and scheduling them to cluster nodes–thus making the cluster’s actual state match the desired state.

You have only setup a readinessProbe, which tells your service if it should send traffic to the pod or not. This is not a good solution as like you can see in your example if you have 10 pods and remove one or two there is a gap and you receive socket error.

  1. How to fix it?

You have to understand this is not broken so it doesn't need a fix.

This might be mitigated by implementing a check in your application to make sure it's sending request to working address or utilize other features like load balancing like ingress.

Also when you are updating deployment you can do checks before deleting the pod to check if it does have any traffic incoming/outgoing and roll the update to only not used pods.

-- Crou
Source: StackOverflow