I was testing my kubernetes services recently. And I found it's very unreliable. Here are the situation:
1. The test service 'A' which receives HTTP requests at port 80 has five pods deployed on three nodes.
2. An nginx ingress was set to route traffic outside onto the service 'A'.
3. The ingress was set like this:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: test-A
annotations:
nginx.ingress.kubernetes.io/proxy-connect-timeout: "1s"
nginx.ingress.kubernetes.io/proxy-next-upstream: "error timeout invalid_header http_502 http_503 http_504"
nginx.ingress.kubernetes.io/proxy-next-upstream-tries: "2"
spec:
rules:
- host: <test-url>
http:
paths:
- path: /
backend:
serviceName: A
servicePort: 80
When I restarted one of the nodes manually, things went wrong:
In the next 3 minutes, about 20% requests were timeout, which is unacceptable in product environment.
I don't know why k8s reacts so slow and is there a way to solve this problem?
You can speed up that fail-over process by configuring liveness and readiness probes in the pods' spec:
Container probes
...
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
Liveness probe example:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
Thanks for @VAS's answer!
Liveness probe is a way to solve this problem.
But I finally figured out that what I want was passive health check, which the k8s dosen't surpport.
And I solved this problem by introducing istio into my cluster.