I created a three node cluster through k8s deployment using the tdengine version 2.4.0.3 image. View the pod information:
kubectl get pods -n mytdengine
NAME READY STATUS RESTART AGE
tdengine-01 1/1 Running 0 1m45s
tdengine-02 1/1 Running 0 1m45s
tdengine-03 1/1 Running 0 1m45s
Everthing was going well.
However, when I tried to stop the pods with delete operation:
kubectl delete pod tdengine-03 -n mytdengine
The target pod is not deleted as expect. The status turns to:
NAME READY STATUS RESTART AGE
tdengine-01 1/1 Running 0 2m35s
tdengine-02 1/1 Running 0 2m35s
tdengine-03 1/1 Terminating 0 2m35s
After several tests, pod will successfully deleted until 3 mins, which is unnormal. I didn't actually use the tdengine instance, which means there are no excessive load or storage occupation. I cannot find a reason to explain why it cost 3mins to shut down.
After testing, I eliminated the problem of kubernetes configuration. Moreover, I found that the parameter ‘terminationgraceperiodseconds’ configured in the yaml file of Pod: 180
terminationgraceperiodseconds:180
This means that the pod was not shut down gracefully, but was forcibly removed after timeout.
Generally speaking, the stop of pod usually sends a signal of signterm. The container processes the signal correctly and makes an elegant shutdown. However, if it does not stop or the container does not respond to the signal and exceeds the timeout set by the above parameter 'termination graceriodseconds', the container will receive the signal of signkill and forcibly kill the container. Ref: https://tasdikrahman.me/2019/04/24/handling-singals-for-applications-in-kubernetes-docker/
The reason for this is tdengine2 4.0.3 the startup script of the image pulls up taosadapter first and then taosd, but it does not rewrite the processing method of signterm signal. Due to the particularity of Linux PID 1, only PID 1 receives the signterm signal after k8s sends it to the pod content container (as shown in the figure below, PID 1 is the startup script) and does not notify taosadapter and taosd (become zombie processes).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9 root 20 0 2873404 80144 2676 S 2.3 0.5 112:30.81 taosadapter
8 root 20 0 2439240 41364 2996 S 1.0 0.3 130:53.67 taosd
1 root 20 0 20044 1648 1368 S 0.0 0.0 0:00.01 run_taosd.sh
7 root 20 0 20044 476 200 S 0.0 0.0 0:00.00 run_taosd.sh
135 root 20 0 20176 2052 1632 S 0.0 0.0 0:00.00 bash
146 root 20 0 38244 1788 1356 R 0.0 0.0 0:00.00 top
I personally choose the way to rewrite hook function in k8s yaml file to immediately delete the container:
lifecycle:
preStop:
command:
- /bin/bash
- -c
- procnum=`ps aux | grep taosd | grep -v -e grep -e entrypoint -e run_taosd
| awk '{print $2}'`; kill -15 $procnum; if ["$?" -eq 0]; then echo "kill
Of course, once we know the cause of the problem, there are other solutions which are not discussed here.