I managed to create a cluster using GKE using gce ingress successfully. However it takes a long time for Ingress to detect the service is ready (I already set both livenessProbe and readinessProbe). My pods set up
Containers:
...
gateway:
Liveness: http-get http://:5100/api/v1/gateway/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:5100/api/v1/gateway/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
...
and ingress
...
Name: main-ingress
Host Path Backends
---- ---- --------
<host>
/api/v1/gateway/ gateway:5100 (<ip:5100>)
/api/v1/gateway/* gateway:5100 (<ip:5100>)
web:80 (<ip>)
Annotations:
ingress.kubernetes.io/backends: {"k8s-be-***":"HEALTHY","k8s-be-***":"HEALTHY","k8s-be-***":"HEALTHY"}
kubernetes.io/ingress.allow-http: false
What I notice is that if I killed all the service and redeploy, the backend stays at UNHEALTHY
for quite some time before it picks it up even though Kubernetes itself managed to pick up that pods/service are all running
I also noticed that when livenessProbe
and readinessProbe
is set, the Backend health check that's generated by ingress-gce is the following
Backend
Timeout: 30 seconds
Backend Health check
Interval: 70 seconds
Timeout: 1 second
Unhealthy threshold: 10 consecutive failures
Healthy threshold: 1 success
Whereas if I just deploy a simple nginx pod without specifying livenessProbe
and readinessProbe
, the backend generated is the following
Backend
Timeout: 30 seconds
Backend Health Check
Interval: 60 seconds
Timeout: 60 seconds
Unhealthy threshold: 10 consecutive failures
Healthy threshold: 1 success
Is the Backend health check the root cause of the slowness of picking things up? If so, any idea how to speed it up?
UPDATE Wanted to clarify after reading @yyyyahir's answer below
I understand that when creating new ingress it will take much longer because the ingress controller needs to provision the new Load Balancer, backend and all the other related things.
However what I also notice is that when I release a new version of the service (through Helm - deployment is set to Recreate rather than RollingUpgrade) OR if the pod is died (out of memory) and restarted, it takes quite a while before the backend status is healthy again despite the Pod is already in running/healthy state (this is with existing Ingress and Load Balancer in GCP). Is there a way to speed this up?
When using GCE Ingress, you need to wait for the load balancer provisioning time before the backend service is deemed as healthy.
Consider that when you use this ingress class, you're relying on the GCE infrastructure that automatically has to provision an HTTP(S) load balancer and all of its components before sending requests into the cluster.
When you set up a deployment without readinessProbe
, the default values are going to be applied to the load balancer health check:
Backend Health Check
Interval: 60 seconds
Timeout: 60 seconds
Unhealthy threshold: 10 consecutive failures
Healthy threshold: 1 success
However, using the readinessProbe
will add the periodSeconds
value to the default health check configuration. So, in your case, you had 10
seconds + 60
by default = 70
.
Backend Health check
Interval: 70 seconds
Timeout: 1 second
Unhealthy threshold: 10 consecutive failures
Healthy threshold: 1 success
Note that GKE will only use readinessProbe
to set the health check in the load balancer. Liveness is never picked.
This means that, the lowest value will always be that of the default load balancer health check, 60
. Since these values are automatically set when the load balancer is invoked from GKE, there is no way to change them.
Wrapping up, you have to wait for the load balancer provisioing period (around ~1-3 minutes) plus the periodSeconds
value set in your readinessProbe
.