Kubernetes Istio latency path wise in Grafana

4/21/2020

I am using Istio in AWS EKS cluster. I am using the pre-installed prometheus and grafana to monitor pods, Istio mesh, Istio service workloads.

I have three services running in three different workspace,

Service 1:- service1.namespace1.svc.cluster.local
Service 2 :- service2.namespace2.svc.cluster.local
Service 3:- service3.namespace3.svc.cluster.local

I can find the latency of each service end points from Istio Service Dashboard in Grafana . But, it just shows the latency for service end points, not the end point prefix. Though the overall service end point latency is fine but I want to check which path is taking time in a service end point.

Let's say P50 Latency of service1.namespace1.svc.cluster.local is 2.91ms , but I also want to check latency of each path. It has four paths,

service1.namespace1.svc.cluster.local/login => Loging Path , P50 Latency = ?
service1.namespace1.svc.cluster.local/signup => Singup Path , P50 Latency = ?
service1.namespace1.svc.cluster.local/auth => Auth path , P50 Latency = ?
service1.namespace1.svc.cluster.local/list => List path , P50 Latency = ?

I am not sure if it is possible in Prometheus and Grafana stack. What is the recommended way to achieve it ?

Istioctl version --remote 

client version: 1.5.1
internal-popcraftio-ingressgateway version: 
citadel version: 1.4.3
galley version: 1.4.3
ingressgateway version: 1.5.1
pilot version: 1.4.3
policy version: 1.4.3
sidecar-injector version: 1.4.3
telemetry version: 1.4.3
pilot version: 1.5.1
office-popcraftio-ingressgateway version: 
data plane version: 1.4.3 (83 proxies), 1.5.1 (4 proxies)
-- codekube
amazon-eks
amazon-web-services
istio
kubectl
kubernetes

1 Answer

4/22/2020

To my knowledge this is not something that the Istio metrics can provide. However, you should take a look at the available metrics that your server framework provides, if any. So, this is application (framework)-dependent. See for instance for SpringBoot ( https://docs.spring.io/spring-metrics/docs/current/public/prometheus ) or Vert.x ( https://vertx.io/docs/vertx-micrometer-metrics/java/#_http_server )

One thing to be aware of, with HTTP path-based metrics, is that it is likely to make the metrics cardinality explode, if not used with care. Imagine some of your paths contain unbounded dynamic values (e.g. /object/123465 , with 123456 being an ID), if that path is stored as a Prometheus label, that would mean under the hood that Prometheus will create one metric per ID, which is likely to cause performance issues on Prometheus and risk out-of-memory on your app.

This is I think a good reason to NOT have Istio providing path-based metrics. While on the other end, frameworks can have the sufficient knowledge to provide metrics based on path template instead of actual path (e.g. /object/$ID instead of /object/123465), which solves the cardinality problem.

PS: Kiali has some documentation about runtimes monitoring, that may help: https://kiali.io/documentation/runtimes-monitoring/

-- Joel
Source: StackOverflow