Intercepting All Pod Shutdowns on Kubernetes to Perform Diagnostics

2/12/2021

I would like to intercept all pod shutdowns to perform diagnostic actions like fetching logs. Is this possible either via the k8s API or some type of Linux hook on process exit?

-- Natan Yellin
kubernetes

1 Answer

2/12/2021

I would encourage you to read about Logging Architecture in Kubernetes.

Application logs can help you understand what is happening inside your application. The logs are particularly useful for debugging problems and monitoring cluster activity.

Depend on your needs you can configure it at Node leve or Cluster level.

Cluster-level logging architectures require a separate backend to store, analyze, and query logs. Kubernetes does not provide a native storage solution for log data. Instead, there are many logging solutions that integrate with Kubernetes.

Depend on your environment (local or cloud) and your needs you can use many integrated applications to centralize logs, like Fluentd, Stackdriver, Datadog, Logspout, etc.

In short you would be able to get all logs from deleted pods and find root cause.

Another thing which might help you to achieve your goal is to use Container Lifecycle Hooks like PostStart and PreStop.

Analogous to many programming language frameworks that have component lifecycle hooks, such as Angular, Kubernetes provides Containers with lifecycle hooks. The hooks enable Containers to be aware of events in their management lifecycle and run code implemented in a handler when the corresponding lifecycle hook is executed.

If you would want to implement them in your setup, you can check Attach Handlers to Container Lifecycle Events Documentation. It's using postStart and preStop events.

Kubernetes sends the postStart event immediately after a Container is started, and it sends the preStop event immediately before the Container is terminated. A Container may specify one handler per event.

For example you could configure preStop event to write some last logs, errors, exit code to a file.

There is also an option to set specific termination message path write there status or information why pod was terminated. More details can be found in Determine the Reason for Pod Failure Documentation.

Last thing which is worth to mention is Termination Grace Period. Grace Period Time is the time the kubelet gives you to shut down gracefully (by handling TERM signals). Additional information you can find in Termination of Pods. It might be solution if pod needs more than 30 seconds to shut down.

It's also worth to mention that you can also use script to get logs from pods like Kubetail.

Bash script that enables you to aggregate (tail/follow) logs from multiple pods into one stream. This is the same as running "kubectl logs -f " but for multiple pods. Blockquote

-- PjoterS
Source: StackOverflow