The title isn't exactly what happens, but I can't describe the problem in one sentence. Here's what really happens.
So I have a ranking program that runs as a Flask app deployment on k8s. The program listens for queries from the user, asks 3 indices on Elasticsearch (Which runs in the same cluster as this deployment) for similar docs and their details with Grequests, calculates a bit and ranks them.
The main function looks like this:
def get_rankings(self,query):
self.query=query
t0=time()
responses = self.get_async_resp(self.query.top_k) #first call
t1=time()
self.candidates = self.process_resp(responses)
t2=time()
self.fillout_solution_info() #second call
t3=time()
self.filter_and_calculate()
t4=time()
self.aggregate()
t5=time()
self.jsonify()
time_info = "First call: %f,\nProcess ES resp: %f,\nSecond call: %f,\nCalculate signals: %f,\nRank candidates: %f" % (t1-t0, t2-t1, t3-t2, t4-t3, t5-t4)
The call in question is the First call:
, i.e what happens during t1-t0
.
When I port-forward the deployment and query it. The time_info looks like this
time_info:
First call: 0.378481,
Process ES resp: 0.004755,
Second call: 0.022264,
Calculate signals: 0.018632,
Rank candidates: 0.005404
However when I ran a jupyter notebook server on that cluster,and query from there. The time_info becomes
time_info:
First call: 0.198042,
Process ES resp: 0.004555,
Second call: 0.048502,
Calculate signals: 0.016378,
Rank candidates: 0.003466
I've sent hundreds of queries and can confirm this is the normal behavior, has nothing to do with Elasticsearch warming up, etc.
You can see the First call became drastically faster. Note this time_info
is timed INSIDE the program, which runs on the same cluster as Elasticsearch. So the delay caused by port-forward
shouldn't matter. Once the query comes in, wherever it comes from, the program runs the same, yet the response time is almost halved. Can someone explain to me why this is and which number I can trust?