I am trying to achieve zero downtime deployment process, but it is not working.
My deployment has one replica. The pod probes look like this:
livenessProbe:
httpGet:
path: /health/live
port: 80
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /health/ready
port: 80
initialDelaySeconds: 15
periodSeconds: 20
During deployment, accessing pod returns 503 for at least 10 seconds. Questions I have:
Running describe
on the pod I get:
Liveness: http-get http://:80/health/live delay=5s timeout=1s period=2s #success=1 #failure=3
Readiness: http-get http://:80/health/ready delay=5s timeout=1s period=2s #success=1 #failure=3
You need to use the RollingUpdate strategy in your Deployment
in addition to probes :
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 1
Interesting global example here
The problem was in
kind: Service
spec:
type: ClusterIP
selector:
app: maintenance-api
version: "1.0.0"
stage: #{Release.EnvironmentName}#
release: #{Release.ReleaseName}#
if selector is sth like #{Release.ReleaseName}# which changes every release then its like the old pod cannot be found so when release starts service is disconnecting from pod and only after the new pod finish deploying the service will start redirecting to it.