I am trying to measure resource usage of one-shot processes ran under Kubernetes as pods. Technically, what I need is something similar to what one can find in /sys/fs/cgroup/memory/..../$container_id/meory.max_usage_in_bytes
and /sys/fs/cgroup/cpu/..../$container_id/cpuacct.usage
(....
stands for the parent cgroup path, whatever K8s sets it to, usually kubepods/burstable/$pod_uid
)
I know that some of this data is available via the cAdvisor API (which is built-in in the kubelet
process on each node and accessible via kube-proxy), however the cAdvisor data is delayed by some 10-20 seconds and I do not have the option to hold on to the pod and keep it alive (and occupying scheduling resources) for this long because the processes under watch are themselves very short-lived and keeping them in memory long enough to let cAdvisor refresh would nearly double the demand on resources on the cluster.
Note I do not have control over the process that runs in the pod's (single) container, the commands that start pods arrive directly at the Kubernetes API from a source that isn't mine to influence in any way, therefore I can't run any code in the container itself to get the stats (these are indeed available in the container's own view of /sys/fs/cgroup/*
).
I also considered using an additional container in the same pod, given that the pod's containers share the same host (and even the same parent cgroup), but I don't know if I can get access to the parent cgroup statistics from a sibling container. Extra container is attractive because I have to use an additional 'no-op' container anyway - the main process is one-shot and exits once it is done, making the container stop and all its stats get lost. The additional container allows me to keep the parent cgroup alive and get its stats (the memory/cpu added by the extra container are constant and negligible, so the parent stats are as good as getting the container stats directly).
Is there a way to allow a container in a pod to see the parent cgroup (or the root cgroup)? Or, is there another alternative to retrieving these stats quickly and without having to wait more than 1-2 seconds after the moment I find out that the container has finished running?
NOTE, I have also considered the option of 'wrapping' the work process in a script that runs the one-shot job, takes statistics and only then exits. Unforutnately, this is contingent on "knowing" how to run that one-shot job - and I don't always have that knowledge, because 'the job' is in a container image that isn't mine and the pod spec does not necessarily include a command-to-run. If the pod spec doesn't specify a command, it will run the container's default entry point and that isn't visible by accessing the Kuberenetes API (which is all I got).
According to the documentation, cAdvisor collects metrics once per second and send them to a repository once per minute. Delay can also be connected to settings of the storage where the repository is located. It also has its own API where you can see current metrics, which it holds in a memory. Therefore, it is possible to collect information on each node from there.
It is not a right way to collect information from files directly, because looking through many containers on many nodes may take much time. But it is possible, for example, you can use a container on each node in the privileged mode with /sys/fs/cgroup
mounted and gather information from there.
And you can try to use Kubernetes Metrics Server, which is a cluster-wide aggregator of resource usage data. Starting from Kubernetes 1.8, resource usage metrics, such as container CPU and memory usage, are available in Kubernetes through the Metrics API. These metrics can be either accessed directly by user, for example by using kubectl top
command, or used by a controller in the cluster, e.g. Horizontal Pod Autoscaler, to make decisions.