I am new to Kubernetes monitoring and self-healing. I wonder what kind of self-healing Kubernetes can provide, such as restart failed pod if necessary? anything else? what Kubernetes cannot provide.
As for Kubernetes monitoring, what kind of metrics we need to monitor in order to operate on Kubernetes instead of Kubernetes self-healing?
Any ideas welcomed. Thanks.
Configure the liveness and readiness probes for pod health. And the Restart policy along with it. You can do more with services and replica sets.
I'm afraid your question goes beyond what is possible to answer here on stackoverflow.
Yes, k8s is able to restart/reschedule pods. If you are already a bit familiar with key concepts, maybe pod-lifecycle is a point to start. If you have little knowledge about k8s basics, I suggest you study Deployments, DaemonSets, Services etc. because Monitoring in k8s relies heavily on them!
You did not say what kind of metrics you are interested in. For system metrics like io/cpu time etc. you can start with e.g. Kubernetes Metrics Server. If you want to get insights into k8s metrics (how many services, uptime, etc.) have a look at kube-state-metrics which is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.
Have fun with k8s
Cheers