I am using Istio in AWS EKS cluster. I am using the pre-installed prometheus and grafana to monitor pods, Istio mesh, Istio service workloads.
I have three services running in three different workspace,
Service 1:- service1.namespace1.svc.cluster.local
Service 2 :- service2.namespace2.svc.cluster.local
Service 3:- service3.namespace3.svc.cluster.local
I can find the latency of each service end points from Istio Service Dashboard
in Grafana . But, it just shows the latency for service end points, not the end point prefix. Though the overall service end point latency is fine but I want to check which path is taking time in a service end point.
Let's say P50 Latency
of service1.namespace1.svc.cluster.local
is 2.91ms , but I also want to check latency of each path. It has four paths,
service1.namespace1.svc.cluster.local/login => Loging Path , P50 Latency = ?
service1.namespace1.svc.cluster.local/signup => Singup Path , P50 Latency = ?
service1.namespace1.svc.cluster.local/auth => Auth path , P50 Latency = ?
service1.namespace1.svc.cluster.local/list => List path , P50 Latency = ?
I am not sure if it is possible in Prometheus and Grafana stack. What is the recommended way to achieve it ?
Istioctl version --remote
client version: 1.5.1
internal-popcraftio-ingressgateway version:
citadel version: 1.4.3
galley version: 1.4.3
ingressgateway version: 1.5.1
pilot version: 1.4.3
policy version: 1.4.3
sidecar-injector version: 1.4.3
telemetry version: 1.4.3
pilot version: 1.5.1
office-popcraftio-ingressgateway version:
data plane version: 1.4.3 (83 proxies), 1.5.1 (4 proxies)
To my knowledge this is not something that the Istio metrics can provide. However, you should take a look at the available metrics that your server framework provides, if any. So, this is application (framework)-dependent. See for instance for SpringBoot ( https://docs.spring.io/spring-metrics/docs/current/public/prometheus ) or Vert.x ( https://vertx.io/docs/vertx-micrometer-metrics/java/#_http_server )
One thing to be aware of, with HTTP path-based metrics, is that it is likely to make the metrics cardinality explode, if not used with care. Imagine some of your paths contain unbounded dynamic values (e.g. /object/123465
, with 123456
being an ID), if that path is stored as a Prometheus label, that would mean under the hood that Prometheus will create one metric per ID, which is likely to cause performance issues on Prometheus and risk out-of-memory on your app.
This is I think a good reason to NOT have Istio providing path-based metrics. While on the other end, frameworks can have the sufficient knowledge to provide metrics based on path template instead of actual path (e.g. /object/$ID
instead of /object/123465
), which solves the cardinality problem.
PS: Kiali has some documentation about runtimes monitoring, that may help: https://kiali.io/documentation/runtimes-monitoring/