I'm having a hard time figure out why ingress on GKE returns 502 errors and timeout during a deployment on a project.
To better understand the issue, I have setup a basic hello application which takes the same workflow.
Here is the complete manifest:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: helloapp
labels:
app: helloapp
spec:
replicas: 3
template:
metadata:
labels:
app: helloapp
spec:
containers:
- name: helloapp
image: gcr.io/${GCLOUD_PROJECT_ID}/helloapp:${HELLOAPP_VERSION}
imagePullPolicy: Always
ports:
- name: http-server
containerPort: 8080
readinessProbe:
httpGet:
path: /sys/health
port: 8080
livenessProbe:
httpGet:
path: /sys/health
port: 8080
---
apiVersion: v1
kind: Service
metadata:
name: helloapp
labels:
app: helloapp
spec:
type: NodePort
externalTrafficPolicy: Local
ports:
- port: 80
targetPort: http-server
selector:
app: helloapp
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: helloapp-http
spec:
backend:
serviceName: helloapp
servicePort: 80
which contains an ingress, a service and customized probe for the pods.
The application is dead simple hello world application written in Go.
During a deployment, if I siege the ingress healthcheck of my application and I notice:
HTTP/1.1 502 9.02 secs: 332 bytes ==> GET /sys/health
HTTP/1.1 502 9.10 secs: 332 bytes ==> GET /sys/health
HTTP/1.1 200 4.70 secs: 473 bytes ==> GET /sys/health
HTTP/1.1 200 4.56 secs: 475 bytes ==> GET /sys/health
HTTP/1.1 200 0.01 secs: 475 bytes ==> GET /sys/health
HTTP/1.1 200 0.01 secs: 476 bytes ==> GET /sys/health
HTTP/1.1 200 0.03 secs: 475 bytes ==> GET /sys/health
HTTP/1.1 200 0.01 secs: 474 bytes ==> GET /sys/health
HTTP/1.1 200 4.58 secs: 475 bytes ==> GET /sys/health
HTTP/1.1 200 4.51 secs: 474 bytes ==> GET /sys/health
HTTP/1.1 200 0.01 secs: 475 bytes ==> GET /sys/health
HTTP/1.1 200 0.01 secs: 475 bytes ==> GET /sys/health
HTTP/1.1 200 4.83 secs: 474 bytes ==> GET /sys/health
HTTP/1.1 502 9.07 secs: 332 bytes ==> GET /sys/health
HTTP/1.1 200 0.02 secs: 475 bytes ==> GET /sys/health
After a few minutes (generally 5-10), it stops and forwards the requests correctly.
Cluster information:
All is fine with your configs. Looks like you have a problem during your go-app starting: round-robing on your service is sending some requests to the non-started app in the pod, and that causes error with code 502.
How long does your app start in the pod? You may add initialDelaySeconds
to fix your error.
spec:
replicas: 3
template:
metadata:
labels:
app: helloapp
spec:
containers:
- name: helloapp
image: gcr.io/${GCLOUD_PROJECT_ID}/helloapp:${HELLOAPP_VERSION}
imagePullPolicy: Always
ports:
- name: http-server
containerPort: 8080
readinessProbe:
httpGet:
path: /sys/health
port: 8080
initialDelaySeconds: 60
livenessProbe:
httpGet:
path: /sys/health
port: 8080
initialDelaySeconds: 120