Why would one chose many smaller machine types instead of fewer big machine types?

11/28/2018

In a clustering high-performance computing framework such as Google Cloud Dataflow (or for that matter even Apache Spark or Kubernetes clusters etc), I would think that it's far more performant to have fewer really BIG machine types rather than many small machine types, right? As in, it's more performant to have 10 n1-highcpu-96 rather than say 120 n1-highcpu-8 machine types, because

  • the cpus can use shared memory, which is way way faster than network communications
  • if a single thread needs access to lots of memory for a single threaded operation (eg sort), it has access to that greater memory in a BIG machine rather than a smaller one

And since the price is the same (eg 10 n1-highcpu-96 costs the same as 120 n1-highcpu-8 machine types), why would anyone opt for the smaller machine types?

As well, I have a hunch that for the n1-highcpu-96 machine type, we'd occupy the whole host, so we don't need to worry about competing demands on the host by another VM from another Google cloud customer (eg contention in the CPU caches or motherboard bandwidth etc.), right?

Finally, although I don't think the google compute VMs correctly report the "true" CPU topology of the host system, if we do chose the n1-highcpu-96 machine type, the reported CPU topology may be a touch closer to the "truth" because presumably the VM is using up the whole host, so the reported CPU topology is a little closer to the truth, so any programs (eg the "NUMA" aware option in Java?) running on that VM that may attempt to take advantage of the topology has a better chance of making the "right decisions".

-- Jonathan Sylvester
google-cloud-dataflow
google-kubernetes-engine

1 Answer

11/28/2018

It will depend on many factors if you want to choose many instances with smaller machine type or a few instances with big machine types.

The VMs sizes differ not only in number of cores and RAM, but also on network I/O performance. Instances with small machine types have are limited in CPU and I/O power and are inadequate for heavy workloads.

Also, if you are planning to grow and scale it is better to design and develop your application in several instances. Having small VMs gives you a better chance of having them distributed across physical servers in the datacenter that have the best resource situation at the time the machines are provisioned.

Having a small number of instances helps to isolate fault domains. If one of your small nodes crashes, that only affects a small number of processes. If a large node crashes, multiple processes go down.

It also depends on the application you are running on your cluster and the workload.I would also recommend going through this link to see the sizing recommendation for an instance.

-- John Mathew
Source: StackOverflow