What is the recommended EC2 instance for Istio bookinfo sample application?

10/8/2019

I have EKS cluster on AWS with istio installed, the first time i installed istio, i used one m3.large EC2 instance and i got some istio services pending, ingress-gateway pods status was showing pending .

I described the pod and i saw error of insufficient CPU.... I increased the EC2 instance to m5.large and every pods started running..

We are actually on staging and this is not live yet, we are spending almost times 3 of our initial cost.

Can someone please recommend an EC2 instance that can conveniently get istio up and running, lets take a look at the bookinfo sample application.

Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  2m33s (x60 over 12m)  default-scheduler  0/1 nodes are available: 1 Insufficient cpu.

It seems provisioning 2 m5.large instances worked perfectly, but this is incurring more cost.. Each m5.large cost 0.107 USD / hours and that is 77 USD / month .

Having two m5.large instance will encure more cost just to run 15 pods (5 custom pods)

Non-terminated Pods:         (15 in total)
-- druphub
amazon-ec2
amazon-web-services
istio
kubernetes

3 Answers

11/25/2019

TL;DR for many requirements the default requests in Istio are extremely greedy. You need to change these with your own values.yaml (assuming you're using Helm) and monitor how much resource Istio is actually using. Using bigger and bigger instance types is a bad solution (unless you really do consume the default requests, or you like spraying money against a wall).

The problem is that Istio, when using the default profiles, makes some very large Requests. This means that even if you've got plenty of available resources, kubernetes will refuse to schedule many of the Istio control plane components.

[I'm assuming you're famililar with kubernetes requests. If not, these are declarations in the pod yaml that "this pods need x cpu and y memory to run comfortably". The Kubernetes pod scheduler will then ensure that pod is scheduled to a node that has sufficient resource. The problem is, many people stick their finger in the air and put massive values in "to be sure". But this means that huge chunks of your available resource are being wasted, if the pod doesn't actually need that resource to be comfortable].

In addition, each sidecar makes a sizeable Request as well, piling on the pressure.

This will be why you're seeing pods stuck in pending.

I'm not 100% convinced that the default requests set by the Istio team are actually that reasonable [edit: for bookinfo, they're certainly not. I suspect the defaults are set for even multithousand node estates]. I would recommend that before boosting your instance sizes (and therefore your costs), look into reducing the requests made by the Istio control and data plane.

If you then find your Istio components are being evicted often, then you've gone too far.

Example: using the supplied Helm values.yaml file here, we have for each sidecar:

  requests:
    cpu: 100m
    memory: 128Mi

(Lines 155-157).

More worringly, the default memory request for Pilot is 2Gb! That means you're going to be giving away a massive chunk (or maybe the whole) of a Node. That's just for Pilot - the same store is true for Galley, Citadel, Telemetry, etc, etc, etc.

You need to monitor a running cluster and if you can determine that these values can be reduced. For example, I have a reasonably busy cluster (way more complicated than the wretched bookinfo), and metrics server is telling me Pilot's cpu is 8millicore(!) and memory 62Mi. So if I'd blindly stuck with the defaults, which most people do, I'd be wasting nearly 2Gb of memory and half a CPU.

See my output here: I stress this is from a long running, production standard cluster:

[ec2-user@ip-172-31-33-8 ~]$ kubectl top pod -n istio-system
NAME                                      CPU(cores)   MEMORY(bytes)
istio-citadel-546575dc4b-crnlj            1m           14Mi
istio-galley-6679f66459-4rlrk             19m          17Mi
istio-ingressgateway-b9f65784b-k64th      1m           22Mi
istio-pilot-67bfb94df4-j7vld              8m           62Mi
istio-policy-598b768ddc-cvs2b             5m           39Mi
istio-sidecar-injector-578bc4cc74-n5v6w   11m          7Mi
istio-telemetry-cd6fddc4b-lt8rl           27m          57Mi
prometheus-6ccfbc78c-w4dd6                25m          497Mi

A more readable guide to the defaults is here.. Run through the requests for the whole of the control plane and add up the required cpu and memory. It's a lot of resource.

This is hard work, but you need to sit down and work out what each component really needs, set up your own values.yaml and generate your own yaml for Istio. The demo yamls provided by Istio are not reasonable, especially for Mickey Mouse apps like bookinfo, which should be taken out the back door and put out of its misery. Bear in mind Istio was developed originally alongside massive multi thousand node clusters.

-- Dick Chesterwood
Source: StackOverflow

10/8/2019

The deployment is made up of a different number of components. Some of them, as pilot, have a large impact in terms of memory and CPU, so it is recommended to have around 8GB of memory and 4 CPUs free in your cluster. Obviously, all components have requested resources defined, so if you don’t have enough capacity you will see pods not starting.

Where you are using M5-large which spec is

m5.large    2 CPU   8 Memory    EBS-Only    

so in the base of above requirement, you need

m5.xlarge   4 CPU   16 Memory   EBS-Only

If your application is need high computing then you may try with compute optmized instance.

Compute optimized instances are ideal for compute-bound applications that benefit from high-performance processors. They are well suited for the following applications:

Batch processing workloads

Media transcoding

High-performance web servers

High-performance computing (HPC)

Scientific modeling

Dedicated gaming servers and ad serving engines

Machine learning inference and other compute-intensive applications

compute-optimized-instances

deploying-istio on AWS and azure recommendation

might help you

https://aws.amazon.com/blogs/opensource/getting-started-istio-eks/

-- Adiii
Source: StackOverflow

10/8/2019

If you look at the AWS instance types listing an m5.large instance is pretty small: it only has 2 CPU cores. On the other hand, if you look at the kubectl get pods --all-namespaces listing, you can see there are quite a few pods involved to run the core Kubernetes system (and several of those are replicated on each node in a multi-node installation).

If 2 cores isn't enough, you can try picking larger instance sizes; if 2x m5.large works then 1x m5.2xlarge will be slightly better and the same cost. If you're just running demo applications like this then the "c" family has half the memory (2 GiB per core) and is slightly cheaper so you might try a c5.2xlarge.

For medium-sized workloads, I'd suggest figuring out your total cluster requirements (based on either pods' resource requests or actual statistics from a tool like Prometheus); dividing that across some number of worker nodes, such that losing one won't be a significant problem (maybe 7 or 9); then selecting the instance size that fits that. It will be easier to run on fewer, larger nodes than more, smaller nodes (there are more places to fit that one pod that requires 8 GB of RAM).

(I routinely need to allocate 4-8 GB of memory for desktop environments like Docker Desktop for Mac or kind and still find it cramped; CPU isn't usually my limitation but I could easily believe that 2 cores and 8 GiB of RAM isn't enough.)

(And yes, AWS is pretty expensive for personal projects without an obvious revenue stream attached to them. You could get that m5.large instance for about $500/year if you were willing to pay that amount up front but that can still be a lot of money to just play around with things.)

-- David Maze
Source: StackOverflow