How to investigate latency spikes in Openshift

3/10/2021

We have recurring latencies in our Openshift cluster.

How can we (besides installing Istio - which is on the way) measure these latencies to get more information?

Is there some helmchart out there that exists for such a purpose?

Here is a result from our Gatling test: enter image description here

-- user2656732
kubernetes
latency
openshift

1 Answer

3/10/2021

Measuring latency requires Distributed Tracing, and DT requires some lines to be added to your code. In fact, even with Istio you need to add some lines to your code, if you want Distributed Tracing. That is why you probably never wll find a Helm chart for that.

The way to go would be to collect the data through OpentracingAPI (now Opentelemetry), and send to some DT backend, like Jaeger or Zipkin.

About modifying your code, As the API works, you would manually start a trace object, and add spans to it, which is an individual work you want to measure. So you would start_span and stop_span wherever you want. You might have several spans in one service, or just one. In order for the other services to add their spans to the same trace object, you would pass a context from one service to another.

With Istio it is a little different. You don't start or stop a span. But your spans will be the services. You would pass some headers, created by the first proxy, from one service to another, and Istio will do the start_span and stop_span for each service. So, with Istio, you can't have several spans per service, but only one.

So, OpentracingAPI is way harder to implement, but you have a complete control over what are you measuring, and Istio is easier to implement, but with some limitations.

Now, you usually don't need more then one span in a service. Since these are microservice, they don't do many things. But the biggest limitation is that you can't measure the database connections with Istio, as these headers are not being handled by a code, but there is just a database, so you need Envoy proxies to support tracing for a specific databases.

-- suren
Source: StackOverflow