I have a GKE deployment consisting of pods that handle long-running, memory-intensive requests. This sits behind a NodePort service which sits behind Ingress (based on this tutorial).
I want to limit the number of concurrent connections per pod to avoid nodes running out of memory. This SO question suggests that it's possible with the Nginx Ingress controller.
Can this be achieved with GKE's default Ingress? I'm fairly new to Kubernetes and would like to limit complexity if possible.
Judging from its documentation, for now it's not possible to limit the number of concurrent connections for each container with the GKE ingress controller.
So, to parameterize the maximum number of connections you'd have to follow the accepted answer of the post you mentioned in which nginxinc/kubernetes-ingress is used and it's a solution arguably more simple than the one on the accepted answer for a similar question which is related with kubernetes/ingress-nginx.
On the other hand, if you don't have a constraint on resources (in which case you'd be limiting the number of concurrent users as for a given resources limit having two concurrent connections to a container it's the same of having two connections to two different containers for example) you can read about Cluster autoscaler and Autoscaling a cluster to understand how GKE is materializing one of the main ideas of Kubernetes which is scaling.
No, you can't. You have to work on it on your own at application level, or as suggested in the other post, use other Ingress Controllers, on which you will have more control over.
Now, if you think about it, nothing is solving limiting the concurrent connections per pod, since a pod can have access to all node resources, unless you have set resource limits, which you can always modify. So, to have 2 pods with 1 connection each, and one pod with 2 connections, would be the same (if you app can handle it). In fact with 2 connections in one pod, you would have more resources from the node for your application.
My point is that if it is resource-wise, it doesn't really make sense to limit the number of connections per pod. That's just going to be extra work for you.
I believe there are other scenarios, where you want to force this limitation, like if you would set it at node level, in which case you are giving all the resources to one connection. And I believe you can achieve this with a queuing system.