Kubernetes Ingress Error: The server encountered a temporary error and could not complete your request

7/23/2018

In our GKE we have one service called php-services. It is defined like so:

apiVersion: v1
kind: Service
metadata:
  name: php-services
  labels:
    name: php-services
spec:
  type: NodePort
  ports:
  - port: 80
  selector:
    name: php-services

I can access this service from inside the cluster. If I run these commands on one of our pods (in Default namespace), I get expected results:

bash-4.4$ nslookup 'php-services'
   Name:      php-services
   Address 1: 10.15.250.136 php-services.default.svc.cluster.local

and

bash-4.4$ wget -q -O- 'php-services/health'
   {"status":"ok"}

So the service is ready and responding correctly. I need to expose this service to foreign traffic. I'm trying to do it with Ingress with following config:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-tls
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.global-static-ip-name: "kubernetes-ingress"
    kubernetes.io/ingress.allow-http: "false"
    external-dns.alpha.kubernetes.io/hostname: "gke-ingress.goout.net"
  namespace: default
spec:
  tls:
  - hosts:
     - php.service.goout.net
    secretName: router-tls
  rules:
  - host: php.service.goout.net
    http:
      paths:
      - backend:
          serviceName: php-services
          servicePort: 80
        path: /*

But then accessing http://php.service.goout.net/health gives an 502 error:

Error: Server Error The server encountered a temporary error and could
not complete your request.
Please try again in 30 seconds.

We also have other services with the same config which run ok and are accessible form outside.

I've found a similar question but that doesn't bring any sufficient answer either.
I've been also following the Debug Service article but that also didn't help as the service itself is OK.

Any help with this issue highly appreciated.

-- Jen
google-cloud-platform
kubernetes

1 Answer

7/24/2018

EDIT TLDR

GKE Loadbalancer only accepts HTTP status 200 while Kubernetes health checks accept any code greater than or equal to 200 and less than 400.

Orignal answer

Ok, so we've figured out what was wrong.

Take a look at yaml definition of the deployment for the php-services service: (shortened)

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: php-services
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      name: php-services
  template:
    metadata:
      labels:
        name: php-services
    spec:
      containers:
        - name: php-services
          image: IMAGE_TAG
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: 80
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 60
            successThreshold: 1
            timeoutSeconds: 10
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: 80
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 60
            successThreshold: 1
            timeoutSeconds: 10
          ports:
          - containerPort: 80

The Apache server inside the image was configured in a way that it redirected from paths without trailing slash to paths with it. So when you requested /health you actually got HTTP status 301 pointing to /health/ which then responded with 200.

In the scope of Kubernetes health checks this is OK as "Any code greater than or equal to 200 and less than 400 indicates success."

However, the problem lied in the GKE Loadbalancer. It also has it's own GKE healthchecks derived from the checks in Deployment definition. The important difference is that it only accepts HTTP status 200. And if the loadbalancer doesn't find a backend service healthy it won't pass any foreign traffic to it.

Therefore we had two options to fix this:

  • Make the server inside the container respond with HTTPS status 200 to both /health and /health/ (or more precisely just to /health)
  • or change the readinessProbe and livenessProbe path definition to /health/.

We choose the later and it fixed the problem.

-- Jen
Source: StackOverflow