I have the following configuration:
For each of those requests, it is expected that the server takes about 2 to 3 minutes to respond.
My problem is that some responses issued by our service don't reach the client (the client timeouts after waiting 15mn), even though the client itself successfully sent its HTTP requests, and the service successfully processed it and responded. In particular I can see from the nginx logs that the response 200
is returned by our service, yet the client does not receive this response.
Here is an example of nginx log from which I deduce our service responded 200:
10.42.0.0 - [10.42.0.0] - - [10/Oct/2019:09:55:12 +0000] \"POST /api/upload/ot_35121G3118C_1.las HTTP/1.1\" 200 0 \"-\" \"reqwest/0.9.20\" 440258210 91.771 [file-upload-service-8000] [] 10.42.0.6:8000 0 15.368 200 583d70db408b6be596f5012e7893a3c3\n
For example, I tried to let the client perform requests continuously during 24h (waiting for the sever to respond before issuing a new request), and about one or two requests per hours have this problem while the 18 others behave as expected.
Because nginx tells me a 200
response was returned by our service, it feels like the response was lost somewhere between the nginx ingress and the client. Any idea about what causes this problem? Is there a way for me to debug this?
EDIT:
To clarify, the exact ingress controller I'm using is nginx-ingress ingress controller, deployed with helm
.
Also, the failure rate is completely random. The tendency is 1 or 2 per hour, but it can sometimes be more or less. In addition it does not seem to be correlated to the size of the file to upload nor to the number of requests that succeeded so far.