The flanneld process on some of our Kubernetes Nodes has been known to crash, causing strange behaviour. I'd like to setup monitoring/alerting to ensure we get notified when/if flanneld crashes. We are running CoreOS as our base OS to run Kubernetes on.
One of the design decisions with CoreOS (as I understand it) is that there should be a bare minimum of software installed on the base OS and everything should run in a Pod/container.
So, with that in mind, I'd like to run a Pod/container to monitor the host process list, to ensure that there is always a process with the name "flanneld" running and send an alert if it is not running.
However, due to any Pod/container having its own process namespace, it seems I can't run a container that has access to the host process list/tree. I've tried to run a container with "privileged: true" but no luck.
Is there a way to run a container on Kubernetes that has access to the host process list/tree?
Alternatively, is there a better way of doing what I'm trying to do? Preferably without installing software directly on the CoreOS system, rather by using a container/Pod.
One way I've found of doing it is to mount the hosts' /proc on the container e.g. "-v /proc:/hostproc" and then periodically going through all of the process numbers listed under /hostproc and verifying there is (for example) a "flanneld" one there.
Why not utilise systemd it self and make sure that when flannel process (service) dies/restarts you get email, webhook triggered or some other event ?
You can easily create drop-ins for systemd units in your cloud-config like you often do with default flannel config to augment the default service file as you see fit.
- name: flanneld.service
command: start
drop-ins:
- name: 01-somedropin.conf
content: |
[Service]
ExecStartPre=-/usr/bin/somecommand