CoreOS Kubernetes how to monitor Node processes?

9/23/2016

The flanneld process on some of our Kubernetes Nodes has been known to crash, causing strange behaviour. I'd like to setup monitoring/alerting to ensure we get notified when/if flanneld crashes. We are running CoreOS as our base OS to run Kubernetes on.

One of the design decisions with CoreOS (as I understand it) is that there should be a bare minimum of software installed on the base OS and everything should run in a Pod/container.

So, with that in mind, I'd like to run a Pod/container to monitor the host process list, to ensure that there is always a process with the name "flanneld" running and send an alert if it is not running.

However, due to any Pod/container having its own process namespace, it seems I can't run a container that has access to the host process list/tree. I've tried to run a container with "privileged: true" but no luck.

Is there a way to run a container on Kubernetes that has access to the host process list/tree?

Alternatively, is there a better way of doing what I'm trying to do? Preferably without installing software directly on the CoreOS system, rather by using a container/Pod.

-- srkiNZ84
coreos
docker
kubernetes
linux

2 Answers

9/23/2016

One way I've found of doing it is to mount the hosts' /proc on the container e.g. "-v /proc:/hostproc" and then periodically going through all of the process numbers listed under /hostproc and verifying there is (for example) a "flanneld" one there.

-- srkiNZ84
Source: StackOverflow

9/23/2016

Why not utilise systemd it self and make sure that when flannel process (service) dies/restarts you get email, webhook triggered or some other event ?

You can easily create drop-ins for systemd units in your cloud-config like you often do with default flannel config to augment the default service file as you see fit.

- name: flanneld.service
  command: start
  drop-ins:
  - name: 01-somedropin.conf
    content: |
    [Service]
    ExecStartPre=-/usr/bin/somecommand
-- Radek 'Goblin' Pieczonka
Source: StackOverflow