Due to official document of kubernetes
Rolling updates allow Deployments' update to take place with zero downtime by incrementally updating Pods instances with new ones
I was trying to perform zero downtime update using Rolling Update
strategy which was recommanded way to update an application in kube cluster. Official reference:
https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/
But i was a a little bit confused about the definition while performing it: downtime of application still happens. Here is my cluster info at the begining, as shown below:
liguuudeiMac:~ liguuu$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/ubuntu-b7d6cb9c6-6bkxz 1/1 Running 0 3h16m
pod/webapp-deployment-6dcf7b88c7-4kpgc 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-4vsch 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-7xzsk 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-jj8vx 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-qz2xq 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-s7rtt 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-s88tb 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-snmw5 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-v287f 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-vd4kb 1/1 Running 0 3m52s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h16m
service/tc-webapp-service NodePort 10.104.32.134 <none> 1234:31234/TCP 3m52s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ubuntu 1/1 1 1 3h16m
deployment.apps/webapp-deployment 10/10 10 10 3m52s
NAME DESIRED CURRENT READY AGE
replicaset.apps/ubuntu-b7d6cb9c6 1 1 1 3h16m
replicaset.apps/webapp-deployment-6dcf7b88c7 10 10 10 3m52s
deployment.apps/webapp-deployment
is a tomcat-based webapp application, and the Service tc-webapp-service
mapped to Pods contains tomcat containers(the full deployment config files was present at the end of ariticle). deployment.apps/ubuntu
is just a standalone app in cluster, which is about to perform infinite http request to tc-webapp-service
every second, so that i can trace the status of so called rolling update of webapp-deployment
, the commands run in ubuntu container was likely as below(infinite loop of curl command in every 0.01 second):
for ((;;)); do curl -sS -D - http://tc-webapp-service:1234 -o /dev/null | grep HTTP; date +"%Y-%m-%d %H:%M:%S"; echo ; sleep 0.01 ; done;
And the output of ubuntu app(everthing is fine):
...
HTTP/1.1 200
2019-08-30 07:27:15
...
HTTP/1.1 200
2019-08-30 07:27:16
...
Then I try to change tag of tomcat image, from 8-jdk8
to 8-jdk11
. Note that the rolling update strategy of deployment.apps/webapp-deployment
has been config correctly, with maxSurge 0
and maxUnavailable 9
.(the same result if these two attr were default )
...
spec:
containers:
- name: tc-part
image: tomcat:8-jdk8 -> tomcat:8-jdk11
...
Then, the output of ubuntu app:
HTTP/1.1 200
2019-08-30 07:47:43
curl: (56) Recv failure: Connection reset by peer
2019-08-30 07:47:43
HTTP/1.1 200
2019-08-30 07:47:44
As shown above, some http requests failed, and this is no doubt the interruption of application while performing rolling update for apps in kube cluster. However, I can also replay the situation mentioned above(interruption) in Scaling down
, the commands as shown below(from 10 to 2):
kubectl scale deployment.apps/tc-webapp-service --replicas=2
After performing the above tests, I was wondering whether so-called Zero downtime
actually means. Although the way mocking http request was a little bit tricky, the situation is so normal for some applications which were designed to be able to handle thousands of, millions of request in one second.
env:
liguuudeiMac:cacheee liguuu$ minikube version
minikube version: v1.3.1
commit: ca60a424ce69a4d79f502650199ca2b52f29e631
liguuudeiMac:cacheee liguuu$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Deployment & Service Config:
# Service
apiVersion: v1
kind: Service
metadata:
name: tc-webapp-service
spec:
type: NodePort
selector:
appName: tc-webapp
ports:
- name: tc-svc
protocol: TCP
port: 1234
targetPort: 8080
nodePort: 31234
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp-deployment
spec:
replicas: 10
selector:
matchLabels:
appName: tc-webapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 9
# Pod Templates
template:
metadata:
labels:
appName: tc-webapp
spec:
containers:
- name: tc-part
image: tomcat:8-jdk8
ports:
- containerPort: 8080
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
scheme: HTTP
port: 8080
path: /
initialDelaySeconds: 5
periodSeconds: 1
To deploy an application which will really update with zero downtime the application should meet some requirements. To mention few of them:
For example if shutdown signal is recived, then it should not respond with 200 to new readiness probes, but it still respond with 200 for liveness untill all old requests are processed.