How to decide how many instances of a service I need inside Kubernetes?

11/1/2018

So basically I am starting with Kubernetes and wanted to try some things out. At this point I want to deploy a Webserver, a Database, a NodeJs Server and so on... And now, how do I decide how many instances of each of these services I need across my servers?

-- Max Gierlachowski
containers
docker
kubernetes
server

2 Answers

11/1/2018

This is a question with a complex answer depending on your particular application behavior and resource utilization. Put simply, the "short answer" is going to be: "It depends". It depends on these main factors:

  • Application Resource Utilization
    • How much RAM, CPU, Disk, sockets, etc... does your application generally use on: Average? Max? Min?
    • What bottlenecks or resource limits does the application bump into first?
    • What routines in the application might cause higher than normal utilization? (This is where a lot of complexity comes in... applications are all different and perform many functions in response to inputs such as client requests. Not every function has the same behavior w.r.t. resource utilization.)
  • High Availability / Failover
    • One of the reasons you chose Kubernetes was probably for the ease of scaling an application and making it highly available with no single point of failure.
    • This comes down to: How available do you need your application to be?
    • On the Cluster / Server level: How many nodes can go down or be unhealthy and still maintain enough working nodes to handle requests?
    • On the Application / Container level: How many Pods can go down and still handle the requests or intended operation?
    • What level of service degradation is acceptable?
  • How do the separate applications interact & behave together?
    • Another really complicated issue that is hard to determine without observing their behavior together
    • You can try to do some analysis on metrics like "Requests Per Second" vs. resource utilization & spikes. However, this can be hard to simplify down to a single number or constant / linear cause / effect relationship.
    • Do some requests or input cause a "fan out" or amplification of load on sub-components?
    • For example:
      • Are there some SQL queries that result in higher DB load than others?
      • Are there some operations that can cause higher resource utilization in Pods backing other Services?
      • How do the systems behave together in a "max load" situation?

This kind of thing is very hard to answer without doing load testing. Not many companies I've seen even do this at all! Sadly enough, any problems like this usually end up happening in production and having to be dealt with after the fact. It ends up being DevOps, Ops, or the on-call engineers having to deal with it, which isn't the greatest scenario because usually that person does not have full knowledge of the application's code in order to diagnose and introspect it fully.

-- TrinitronX
Source: StackOverflow

11/2/2018

If you are using kubernetes anyway then use the :

  • horizontal pod autoscaler for stateless components such as webservers

  • horizontal pod autoscaler for app servers

  • a stateful set or operator for DB compoenets ( estimate the initial size of cluster and manually grow later )

And its all done.

Things will auto grow and shrink according to the load.

Note: use the answer by @Trin as guideline how to configure the autoscalers and autoscaling criteria. There is a list of metrics exposed by kubernetes metrics system that you can use for auto scaling.

-- Ijaz Ahmad Khan
Source: StackOverflow