I am writing one grpc service and using gRPC health checking on Kubernetes (https://github.com/grpc-ecosystem/grpc-health-probe). In my server, I added different implementation (one for liveness and other for readiness) of endpoints. I am wondering how this probe utility binary differentiates between liveness check vs readiness check? There should be some other way to define it in yaml, not just ["bin/grpc_health_probe", "-addr=:8801"]
server = ServerBuilder.forPort(port)
.addService(new GrpcModuleHealthCheck())
.addService(new GrpcModuleReadinessCheck())
.addService(ProtoReflectionService.newInstance())
.build.start
In kubernetes deployment yaml, I am using below configurations
livenessProbe:
failureThreshold: 3
exec:
command: ["bin/grpc_health_probe", "-addr=:8801"]
initialDelaySeconds: 240
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 15
readinessProbe:
failureThreshold: 3
exec:
command: ["bin/grpc_health_probe", "-addr=:8801"]
initialDelaySeconds: 20
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 15
I just tested and found "GrpcModuleReadinessCheck" (the health class which I added last) implementation is taking effect when I just exec my kubernetes pod
kubectl exec -it <MY_POD_NAME> -- /bin/bash
bash-4.4$ ./grpc_health_probe -addr=localhost:8801
status: SERVING
I am wondering how this probe utility binary differentiates between liveness check vs readiness check?
In short, it doesn't.
Kubernetes defines two distinct checks: liveness to check whether the program is still working properly (i.e. did not hang) and readiness to check whether the program is willing to accept more requests.
However, gRPC only defines a single health checking protocol and does not have a native concept of "readiness check".
It is up to you how you want to map gRPC responses to Kubernetes checks. A reasonable way would be interpreting SERVING
response as the service being alive and ready to accept more requests, NOT SERVING
response as the service being alive but not accepting requests, and UNKNOWN
or failure to respond as the service not being alive.
Here is the probe configuration implementing that:
livenessProbe:
failureThreshold: 3
exec:
# considers both SERVING and NOT SERVING to be a success
command: ["/bin/sh", "-c", "bin/grpc_health_probe -addr=:8801 2>&1 | grep -q SERVING"]
initialDelaySeconds: 240
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 15
readinessProbe:
failureThreshold: 3
exec:
# fails on any response except SERVING
command: ["bin/grpc_health_probe", "-addr=:8801"]
initialDelaySeconds: 20
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 15