Pod got CrashLoopBackOff in Kubernetes because of GCP service account

1/15/2020

After deployment with using helm carts, I got CrashLoopBackOff error. NAME READY STATUS RESTARTS AGE myproject-myproject-54ff57477d-h5fng 0/1 CrashLoopBackOff 10 24m

Then, I describe the pod to see events and I saw smth like below

 Liveness probe failed: Get http://10.16.26.26:8080/status: 
 dial tcp 10.16.26.26:8080: connect: connection refused

Readiness probe failed: Get http://10.16.26.26:8080/status: 
dial tcp 10.16.26.26:8080: connect: connection refused

Lastly, I saw invalid grant access to my GCP cloud proxy in logs as below time="2020-01-15T15:30:46Z" level=fatal msg=application_main error="Post https://www.googleapis.com/{....blabla.....}: oauth2: cannot fetch token: 400 Bad Request\nResponse: {\n \"error\": \"invalid_grant\",\n \"error_description\": \"Not a valid email or user ID.\"\n}"

However, I checked my service account in IAM, it has access to cloud proxy. Furthermore, I tested with using same credentials in my local, and endpoint for readiness probe was working successfully.

Does anyone has any suggestion about my problem?

-- Denis
google-cloud-iam
google-cloud-platform
kubernetes

2 Answers

1/21/2020

Referring to problem with granting access on GCP - fix this by using Email Address (the string that ends with ...@developer.gserviceaccount.com) instead of Client ID for client_id parameter value. The naming set by Google is confusing.

More information and troubleshooting you can find here: google-oautgh-grant.

Referring to problem with probes:

Check if URL is health. Your Probes may be too sensitive - your application take a while to start or respond.

Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.

Liveness probe checks if your application is in a healthy state in your already running pod.

Readiness probe will actually check if your pod is ready to receive traffic. Thus, if there is no /path endpoint, it will never appear as Running

egg:

          livenessProbe:
            httpGet:
              path: /your-path
              port: 5000
            failureThreshold: 1
            periodSeconds: 2
            initialDelaySeconds: 2
            ports:
              - name: http
              containerPort: 5000

If endpoint /index2 will not exist pod will never appear as Running.

Make sure that you properly set up liveness and readiness probe.

For an HTTP probe, the kubelet sends an HTTP request to the specified path and port to perform the check. The kubelet sends the probe to the pod’s IP address, unless the address is overridden by the optional host field in httpGet. If scheme field is set to HTTPS, the kubelet sends an HTTPS request skipping the certificate verification. In most scenarios, you do not want to set the host field. Here’s one scenario where you would set it. Suppose the Container listens on 127.0.0.1 and the Pod’s hostNetwork field is true. Then host, under httpGet, should be set to 127.0.0.1. Make sure you did it. If your pod relies on virtual hosts, which is probably the more common case, you should not use host, but rather set the Host header in httpHeaders.

For a TCP probe, the kubelet makes the probe connection at the node, not in the pod, which means that you can not use a service name in the host parameter since the kubelet is unable to resolve it.

Most important thing you need to configure when using liveness probes. This is the initialDelaySeconds setting.

Make sure that you do have port 80 open on the container.

Liveness probe failure causes the pod to restart. You need to make sure the probe doesn’t start until the app is ready. Otherwise, the app will constantly restart and never be ready!

I recommend to use p99 startup time for the initialDelaySeconds.

Take a look here: probes-kubernetes, most-common-fails-kubernetes-deployments.

-- MaggieO
Source: StackOverflow

1/16/2020

You can disable liveness probe to stop CrashLoopBackoff, exec into container and test from there. Ideally you should not keep save config for liveness and readiness probe.It is not advisable for liveness probe to depend on anything external, it should just check if pod is live or not.

-- ffran09
Source: StackOverflow