GKE RollingUpdate having downtime

3/19/2019

I am attempting to do a rolling update of a deployment and I am still getting about 2 seconds of downtime. Did I misconfigure my yaml? I am also using ambassador as the API gateway and not entirely sure whether the issue is in the API Gateway or in the deployment.

I wrote a simple shell script that runs a curl every second and I always have about 2 seconds of downtime with this config. Any help is much appreciated.

Here is my relevant config.

deployment Method:

kubectl apply -f MY_DEPLOYMENT.yaml

Deployment Snippet:

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: web-service-rf
spec:
  selector:
    matchLabels:
      app: web-service-rf
  replicas: 2 # tells deployment to run 2 pods matching the template
  strategy:
    rollingUpdate:
      maxSurge: 4
      maxUnavailable: 0%

Liveness and Readiness Probe:

  livenessProbe:
    httpGet:
      path: /health/liveness
      port: 80
    initialDelaySeconds: 5
    periodSeconds: 3
  readinessProbe:
    httpGet:
      path: /health/readiness
      port: 80
    initialDelaySeconds: 5
    periodSeconds: 5
    successThreshold: 1 
-- mornindew
google-kubernetes-engine
kubernetes

2 Answers

3/19/2019

The maxUnavailable should be the number of pods that can be unavailable at the same time and not the percentage of these pods. My guess is that by putting the % sign on the value you cause a parsing error. Which results in this behavior.

-- Lukas Eichler
Source: StackOverflow

3/19/2019

you're YAML seems fine, I suspect the 2 second down time may be due to a TCP connection that did not flush properly to the pod that is being replaced.

Can you add a prestop hook to your container to make sure that all TCP connections are drained before terminating the pod?

-- Patrick W
Source: StackOverflow