I am trying to expose an mlflow model in a GKE cluster through an ingress-nginx and a google cloud load balancer.
The configuration of service to the respective deployment looks as follows:
apiVersion: v1
kind: Service
metadata:
name: model-inference-service
labels:
app: inference
spec:
ports:
- port: 5555
targetPort: 5555
selector:
app: inference
When forwarding this service to localhost using kubectl port-forward service/model-inference-service 5555:5555
I can successfully query the model by sending a test image to the api endpoint using the following script.
The url the request is sent to is http://127.0.0.1:5555/invocations
.
This works as intended so I assume the deployment running the pod exposing the model and the corresponding clusterIP service model-inference-service
is configured correctly.
Next, I installed ingress-nxinx into the cluster by doing
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install my-release ingress-nginx/ingress-nginx
The ingress is configured as follows (I suspect the error has to be here?):
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
# nginx.ingress.kubernetes.io/rewrite-target: /invocations
name: inference-ingress
namespace: default
labels:
app: inference
spec:
rules:
- http:
paths:
- path: /invocations
backend:
serviceName: model-inference-service
servicePort: 5555
The ingress controller pod is running successfully:
my-release-ingress-nginx-controller-6758cc8f45-fwtw7 1/1 Running 0 3h33m
In the GCP console I can see that the load balancer was created successfully as well and I can optain its IP.
When using the same test script I used before to make a request to the Rest api endpoint (previously the service was forwarded to localhost) but now with the ip of the load balancer, I get a 502 Bad Gateway error:
The url is the following now: http://34.90.4.0:80/invocations
Traceback (most recent call last):
File "test_inference.py", line 80, in <module>
run()
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "test_inference.py", line 76, in run
print(score_model(data_path, host, port).text)
File "test_inference.py", line 54, in score_model
status_code=response.status_code, text=response.text
Exception: Status Code 502. <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.19.1</center>
</body>
</html>
When accessing the same url in a browser it says:
502 Bad Gateway
nginx/1.19.1
The logs of the ingress controller state:
2020/08/26 16:06:45 [warn] 86#86: *42282 a client request body is buffered to a temporary file /tmp/client-body/0000000009, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
10.10.0.30 - - [26/Aug/2020:16:06:45 +0000] "POST /invocations HTTP/1.1" 502 157 "-" "python-requests/2.24.0" 86151 0.738 [default-model-inference-service-5555] [] 10.52.3.7:5555, 10.52.3.7:5555, 10.52.3.7:5555 0, 0, 0 0.000, 0.001, 0.000 502, 502, 502 0d86e360427c0a81c287da4ff5e907bc
To test if the ingress and the load balancer work in principle I replaced the docker image with the real rest api I want to expose with this docker image which returns "hello world" on port 5050 and path /
. I changed the port and the path (from /invocations
to /
) in the service and ingress manifests shown above and could successfully see "hello world" when accessing the ip of the load balancer in the browser.
Does anyone see what I might have done wrong? Thank you very much!
Best regards,
F
The configuration you have shared is looking fine. There must be something in your cluster environment that is causing this behavior. See if pod-to-pod communication is working. Launch a test pod on the same node as the Nginx ingress controller and do a curl
from that pod to the target service. See if you get any DNS or Network issues. Try changing the host header when calling the service and see if it's sensitive to that.