I have an app hosted on GKE which, among many tasks, serve's a zip file to clients. These zip files are constructed on the fly through many individual files on google cloud storage.
The issue that I'm facing is that when these zip's get particularly large, the connection fails randomly part way through (anywhere between 1.4GB to 2.5GB). There doesn't seem to be any pattern with timing either - it could happen between 2-8 minutes.
AFAIK, the connection is disconnecting somewhere between the load balancer and my app. Is GKE ingress (load balancer) known to close long/large connections?
GKE setup:
More details/debugging steps:
statusDetails: "backend_connection_closed_after_partial_response_sent"
while the response has a 200 status code. A google of this gave nothing helpful.I believe your "backend_connection_closed_after_partial_response_sent" issue is caused by websocket connection being killed by the back-end prematurily. You can see the documentation on websocket proxying in nginx - it explains the nature of this process. In short - by default WebSocket connection is killed after 10 minutes.
Why it works when you download the file directly from the pod ? Because you're bypassing the load-balancer and the websocket connection is kept alive properly. When you proxy websocket then things start to happen because WebSocket relies on hop-by-hop headers which are not proxied.
Similar case was discussed here. It was resolved by sending ping frames from the back-end to the client.
In my opinion your best shot is to do the same. I've found many cases with similar issues when websocket was proxied and most of them suggest to use pings because it will reset the connection timer and will keep it alive.
Here's more about pinging the client using WebSocket and timeouts
I work for Google and this is as far as I can help you - if this doesn't resolve your issue you have to contact GCP support.