I have EKS cluster on AWS with istio installed, the first time i installed istio, i used one m3.large EC2 instance and i got some istio services pending, ingress-gateway pods status was showing pending .
I described the pod and i saw error of insufficient CPU.... I increased the EC2 instance to m5.large and every pods started running..
We are actually on staging and this is not live yet, we are spending almost times 3 of our initial cost.
Can someone please recommend an EC2 instance that can conveniently get istio up and running, lets take a look at the bookinfo sample application.
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m33s (x60 over 12m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
It seems provisioning 2 m5.large instances worked perfectly, but this is incurring more cost.. Each m5.large cost 0.107 USD / hours and that is 77 USD / month .
Having two m5.large instance will encure more cost just to run 15 pods (5 custom pods)
Non-terminated Pods: (15 in total)
TL;DR for many requirements the default requests in Istio are extremely greedy. You need to change these with your own values.yaml (assuming you're using Helm) and monitor how much resource Istio is actually using. Using bigger and bigger instance types is a bad solution (unless you really do consume the default requests, or you like spraying money against a wall).
The problem is that Istio, when using the default profiles, makes some very large Requests. This means that even if you've got plenty of available resources, kubernetes will refuse to schedule many of the Istio control plane components.
[I'm assuming you're famililar with kubernetes requests. If not, these are declarations in the pod yaml that "this pods need x cpu and y memory to run comfortably". The Kubernetes pod scheduler will then ensure that pod is scheduled to a node that has sufficient resource. The problem is, many people stick their finger in the air and put massive values in "to be sure". But this means that huge chunks of your available resource are being wasted, if the pod doesn't actually need that resource to be comfortable].
In addition, each sidecar makes a sizeable Request as well, piling on the pressure.
This will be why you're seeing pods stuck in pending.
I'm not 100% convinced that the default requests set by the Istio team are actually that reasonable [edit: for bookinfo, they're certainly not. I suspect the defaults are set for even multithousand node estates]. I would recommend that before boosting your instance sizes (and therefore your costs), look into reducing the requests made by the Istio control and data plane.
If you then find your Istio components are being evicted often, then you've gone too far.
Example: using the supplied Helm values.yaml file here, we have for each sidecar:
requests:
cpu: 100m
memory: 128Mi
(Lines 155-157).
More worringly, the default memory request for Pilot is 2Gb! That means you're going to be giving away a massive chunk (or maybe the whole) of a Node. That's just for Pilot - the same store is true for Galley, Citadel, Telemetry, etc, etc, etc.
You need to monitor a running cluster and if you can determine that these values can be reduced. For example, I have a reasonably busy cluster (way more complicated than the wretched bookinfo), and metrics server is telling me Pilot's cpu is 8millicore(!) and memory 62Mi. So if I'd blindly stuck with the defaults, which most people do, I'd be wasting nearly 2Gb of memory and half a CPU.
See my output here: I stress this is from a long running, production standard cluster:
[ec2-user@ip-172-31-33-8 ~]$ kubectl top pod -n istio-system
NAME CPU(cores) MEMORY(bytes)
istio-citadel-546575dc4b-crnlj 1m 14Mi
istio-galley-6679f66459-4rlrk 19m 17Mi
istio-ingressgateway-b9f65784b-k64th 1m 22Mi
istio-pilot-67bfb94df4-j7vld 8m 62Mi
istio-policy-598b768ddc-cvs2b 5m 39Mi
istio-sidecar-injector-578bc4cc74-n5v6w 11m 7Mi
istio-telemetry-cd6fddc4b-lt8rl 27m 57Mi
prometheus-6ccfbc78c-w4dd6 25m 497Mi
A more readable guide to the defaults is here.. Run through the requests for the whole of the control plane and add up the required cpu and memory. It's a lot of resource.
This is hard work, but you need to sit down and work out what each component really needs, set up your own values.yaml and generate your own yaml for Istio. The demo yamls provided by Istio are not reasonable, especially for Mickey Mouse apps like bookinfo, which should be taken out the back door and put out of its misery. Bear in mind Istio was developed originally alongside massive multi thousand node clusters.
The deployment is made up of a different number of components. Some of them, as pilot, have a large impact in terms of memory and CPU, so it is recommended to have around 8GB of memory and 4 CPUs free in your cluster. Obviously, all components have requested resources defined, so if you don’t have enough capacity you will see pods not starting.
Where you are using M5-large which spec is
m5.large 2 CPU 8 Memory EBS-Only
so in the base of above requirement, you need
m5.xlarge 4 CPU 16 Memory EBS-Only
If your application is need high computing then you may try with compute optmized instance.
Compute optimized instances are ideal for compute-bound applications that benefit from high-performance processors. They are well suited for the following applications:
Batch processing workloads
Media transcoding
High-performance web servers
High-performance computing (HPC)
Scientific modeling
Dedicated gaming servers and ad serving engines
Machine learning inference and other compute-intensive applications
deploying-istio on AWS and azure recommendation
might help you
https://aws.amazon.com/blogs/opensource/getting-started-istio-eks/
If you look at the AWS instance types listing an m5.large instance is pretty small: it only has 2 CPU cores. On the other hand, if you look at the kubectl get pods --all-namespaces
listing, you can see there are quite a few pods involved to run the core Kubernetes system (and several of those are replicated on each node in a multi-node installation).
If 2 cores isn't enough, you can try picking larger instance sizes; if 2x m5.large works then 1x m5.2xlarge will be slightly better and the same cost. If you're just running demo applications like this then the "c" family has half the memory (2 GiB per core) and is slightly cheaper so you might try a c5.2xlarge.
For medium-sized workloads, I'd suggest figuring out your total cluster requirements (based on either pods' resource requests or actual statistics from a tool like Prometheus); dividing that across some number of worker nodes, such that losing one won't be a significant problem (maybe 7 or 9); then selecting the instance size that fits that. It will be easier to run on fewer, larger nodes than more, smaller nodes (there are more places to fit that one pod that requires 8 GB of RAM).
(I routinely need to allocate 4-8 GB of memory for desktop environments like Docker Desktop for Mac or kind and still find it cramped; CPU isn't usually my limitation but I could easily believe that 2 cores and 8 GiB of RAM isn't enough.)
(And yes, AWS is pretty expensive for personal projects without an obvious revenue stream attached to them. You could get that m5.large instance for about $500/year if you were willing to pay that amount up front but that can still be a lot of money to just play around with things.)