Resource metrics (memory, CPU etc) should I be looking at for auto scaling purpose

8/14/2017

What cloud resource metrics (memory, CPU, disk io etc) should I be looking at for auto scaling purpose? FYI, The metrics is strictly used for auto scaling purpose. I have kubernetes architecture and prometheus (for monitoring and scraping metrics)

I have a kubernetes cluster set up in local as well as cloud. I am using Prometheus tool(https://prometheus.io/) set up for scraping system level metrics. Now, I want to have Auto-scaling feature in my system. I have been using prometheus for saving metrics like this. "Memory and CPU used, allocated, total for the last 24 hours." I want to save more metrics. This is the list of metrics that I am getting from Prometheus: http://demo.robustperception.io:9100/metrics I can't decide what more metrics I am going to need for auto scaling purpose. Can anyone suggest some metrics for this purpose? TIA.

-- Darshil
autoscaling
kubernetes
metrics
prometheus
resources

1 Answer

8/15/2017

Normally, the common bottleneck is the memory hierarchy rather than CPU usage. The more requests your application receives, the more likely to have an out-of-memory error. What is more, if your application is not HPC, it is not likely that it needs to be so CPU-intensive.

In the memory hierarchy, Disk I/O can dramatically affect performance. You would need to check how Disk I/O intensive your application is. In this sense, changing the disk hardware could be a better solution rather than spinning up more instances. However, that depends on the application.

In any case, it would be interesting if you could measure the average response time, and then take decisions accordingly.

-- Javier Salmeron
Source: StackOverflow