I have an ingress providing routing for two microservices running on GKE, and intermittently when the microservice returns a 404/422, the ingress returns a 502.
Here is my ingress definition:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: basic-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: develop-static-ip
ingress.gcp.kubernetes.io/pre-shared-cert: dev-ssl-cert
spec:
rules:
- http:
paths:
- path: /*
backend:
serviceName: srv
servicePort: 80
- path: /c/*
backend:
serviceName: collection
servicePort: 80
- path: /w/*
backend:
serviceName: collection
servicePort: 80
I run tests that hit the srv
back-end where I expect a 404 or 422 response. I have verified when I hit the srv
back-end directly (bypassing the ingress) that the service responds correctly with the 404/422.
When I issue the same requests through the ingress, the ingress will intermittently respond with a 502 instead of the 404/422 coming from the back-end.
How can I have the ingress just return the 404/422 response from the back-end?
Here's some example code to demonstrate the behavior I'm seeing (the expected status is 404):
>>> for i in range(10):
resp = requests.get('https://<server>/a/v0.11/accounts/junk', cookies=<token>)
print(resp.status_code)
502
502
404
502
502
404
404
502
404
404
And here's the same requests issued from a python prompt within the pod, i.e. bypassing the ingress:
>>> for i in range(10):
... resp = requests.get('http://0.0.0.0/a/v0.11/accounts/junk', cookies=<token>)
... print(resp.status_code)
...
404
404
404
404
404
404
404
404
404
404
Here's the output of the kubectl commands to demonstrate that the loadbalancer is set up correctly (I never get a 502 for a 2xx/3xx response from the microservice):
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
srv-799976fbcb-4dxs7 2/2 Running 0 19m 10.24.3.8 gke-develop-default-pool-ea507abc-43h7 <none> <none>
srv-799976fbcb-5lh9m 2/2 Running 0 19m 10.24.1.7 gke-develop-default-pool-ea507abc-q0j3 <none> <none>
srv-799976fbcb-5zvmv 2/2 Running 0 19m 10.24.2.9 gke-develop-default-pool-ea507abc-jjzg <none> <none>
collection-5d9f8586d8-4zngz 2/2 Running 0 19m 10.24.1.6 gke-develop-default-pool-ea507abc-q0j3 <none> <none>
collection-5d9f8586d8-cxvgb 2/2 Running 0 19m 10.24.2.7 gke-develop-default-pool-ea507abc-jjzg <none> <none>
collection-5d9f8586d8-tzwjc 2/2 Running 0 19m 10.24.2.8 gke-develop-default-pool-ea507abc-jjzg <none> <none>
parser-7df86f57bb-9qzpn 1/1 Running 0 19m 10.24.0.8 gke-develop-parser-pool-5931b06f-6mcq <none> <none>
parser-7df86f57bb-g6d4q 1/1 Running 0 19m 10.24.5.5 gke-develop-parser-pool-5931b06f-9xd5 <none> <none>
parser-7df86f57bb-jchjv 1/1 Running 0 19m 10.24.0.9 gke-develop-parser-pool-5931b06f-6mcq <none> <none>
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
srv NodePort 10.0.2.110 <none> 80:30141/TCP 129d
collection NodePort 10.0.4.237 <none> 80:30270/TCP 129d
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 130d
$ kubectl get endpoints
NAME ENDPOINTS AGE
srv 10.24.1.7:80,10.24.2.9:80,10.24.3.8:80 129d
collection 10.24.1.6:80,10.24.2.7:80,10.24.2.8:80 129d
kubernetes 35.237.239.186:443 130d
502 is a tricky status code, it can mean a context cancelled by the client or simply a bad gateway from the server you are trying to reach. In kubernetes a 502 usually means you cannot reach the service. Thus, I would go for debugging your services and deployments doc.
Use kubectl get pods -o wide
to get your srv
pod; check its clusterIP IP. Then make sure the service is load balancing the srv
deployment. To accomplish this, run kubectl get svc
and look for the srv
service. Finally run kubectl get endpoints
, get the IP assigned to the srv
endpoint and match it against the IP you obtained from the pod. If this is all ok, then you are correctly load-balancing to your backend.
tl;dr: GCP LoadBalancer/GKE Ingress will 502 if 404/422s from the back-ends don't have response bodies.
Looking at the LoadBalancer logs, I would see the following errors:
502: backend_connection_closed_before_data_sent_to_client
404: backend_connection_closed_after_partial_response_sent
Since everything was configured correctly (even the LoadBalancer said the backends were healthy)--backend was working as expected and no failed health checks--I experimented with a few things and noticed that all of my 404 responses had empty bodies.
Sooo, I added a body to my 404 and 422 responses and lo and behold no more 502s!
502 errors are expected when your backend service is returning 4xx errors. If the backend is returning 4xx, the health checks will fail. If all backends are failing, the Load Balancer will not have an available backend to send the traffic to and will return 502.
For any 502 error returned from the Load Balancer, I strongly recommend checking the stackdriver logs for the HTTP Load Balancer. Any 502 error will include a message output along with the 502 response. The message should clarify why 502 was reutned (there are a number of reasons).
In your current case, the 502 error log should mention "failed_to_pick_backend" or "failed_to_connect_to_backend", something to that tune. If you are using nginx ingress, similar behavior can be seen, but the 502 error message may say something different.