How can I get down time of a specific deployment in kubernetes?

8/8/2018

I have an use case where I need to collect the downtime of each deployment (if all the replicas(pods) are down at the same point of time).

My goal is to maintain the total down time for each deployment since it was created.

I tried getting it from deployment status, but the problem is that I need to make frequent calls to get the deployment and check for any down time.

Also the deployment status stores only the latest change. So, I will end up missing out the changes that occurred in between each call if there is more than one change(i.e., down time). Also I will end up making multiple calls for multiple deployments frequently which will consume more compute resource.

Is there any reliable method to collect the down time data of an deployment?

Thanks in advance.

-- Sujai Sivasamy
kubernetes
monitoring

1 Answer

8/8/2018

A monitoring tool like prometheus would be a better solution to handle this. As an example, below is a graph from one of our deployments for last 2 days

Deployment Availablity

If you look at the blue line for unavailable replicas, we had one replica unavailable from about 17:00 to 10:30 (ideally unavailable count should be zero)

This seems pretty close to what you are looking for.

-- Harshal Shah
Source: StackOverflow