CrashLoopBackOff Error when deploying Django app on GKE (Kubernetes)

12/6/2019

Folks,

What problem now still persists: I have now gone beyond the code getting stuck on CrashLoopBackOff by fixing the Dockerfile run command as suggested by Emil Gi, however the external IP is not forwarding to my pod library app server

Status

  • Fixed port to 8080 in Dockerfile and ensured it is consistent across
  • Made sure Dockerfile has proper commands so that it doesn't terminate immediately post startup, this was what was causing the CrashLoop Back
  • Problem is still that the load balancer external IP I click on gives this error "This site can’t be reached34.93.141.11 refused to connect."

Original Question:

How do I resolve this CrashLoopBackOff? I looked at many docs and tried debugging but unsure what is causing this? The app runs perfectly in local mode, it even deploys smoothly into appengine standard, but GKE nope. Any pointers to debug this further most appreciated. Problem: The cloudsql proxy container is running, but the library-app container is having CrashLoopBackOff error. The pod was assigned to a node, starts pulling the images, starting the images, and then it goes into this BackOff state.

 $ kubectl get pods
NAME                       READY   STATUS             RESTARTS   AGE
library-7699b84747-9skst   1/2     CrashLoopBackOff   28         121m

$ kubectl logs library-7699b84747-9skst 
Error from server (BadRequest): a container name must be specified for pod library-7699b84747-9skst, choose one of: [library-app cloudsql-proxy]

​$ kubectl describe pods library-7699b84747-9skst
Name:               library-7699b84747-9skst
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               gke-library-default-pool-35b5943a-ps5v/10.160.0.13
Start Time:         Fri, 06 Dec 2019 09:34:11 +0530
Labels:             app=library
                    pod-template-hash=7699b84747
Annotations:        kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container library-app; cpu request for container cloudsql-proxy
Status:             Running
IP:                 10.16.0.10
Controlled By:      ReplicaSet/library-7699b84747
Containers:
  library-app:
    Container ID:   docker://e7d8aac3dff318de34f750c3f1856cd754aa96a7203772de748b3e397441a609
    Image:          gcr.io/library-259506/library
    Image ID:       docker-pullable://gcr.io/library-259506/library@sha256:07f54e055621ab6ddcbb49666984501cf98c95133bcf7405ca076322fb0e4108
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 06 Dec 2019 09:35:07 +0530
      Finished:     Fri, 06 Dec 2019 09:35:07 +0530
    Ready:          False
    Restart Count:  2
    Requests:
      cpu:  100m
    Environment:
      DATABASE_USER:      <set to the key 'username' in secret 'cloudsql'>  Optional: false
      DATABASE_PASSWORD:  <set to the key 'password' in secret 'cloudsql'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
  cloudsql-proxy:
    Container ID:  docker://352284231e7f02011dd1ab6999bf9a283b334590435278442e9a04d4d0684405
    Image:         gcr.io/cloudsql-docker/gce-proxy:1.16
    Image ID:      docker-pullable://gcr.io/cloudsql-docker/gce-proxy@sha256:7d302c849bebee8a3fc90a2705c02409c44c91c813991d6e8072f092769645cf
    Port:          <none>
    Host Port:     <none>
    Command:
      /cloud_sql_proxy
      --dir=/cloudsql
      -instances=library-259506:asia-south1:library=tcp:3306
      -credential_file=/secrets/cloudsql/credentials.json
    State:          Running
      Started:      Fri, 06 Dec 2019 09:34:51 +0530
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /cloudsql from cloudsql (rw)
      /etc/ssl/certs from ssl-certs (rw)
      /secrets/cloudsql from cloudsql-oauth-credentials (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  cloudsql-oauth-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloudsql-oauth-credentials
    Optional:    false
  ssl-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs
    HostPathType:  
  cloudsql:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  default-token-kj497:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-kj497
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age               From                                             Message
  ----     ------     ----              ----                                             -------
  Normal   Scheduled  86s               default-scheduler                                Successfully assigned default/library-7699b84747-9skst to gke-library-default-pool-35b5943a-ps5v
  Normal   Pulling    50s               kubelet, gke-library-default-pool-35b5943a-ps5v  pulling image "gcr.io/cloudsql-docker/gce-proxy:1.16"
  Normal   Pulled     47s               kubelet, gke-library-default-pool-35b5943a-ps5v  Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:1.16"
  Normal   Created    46s               kubelet, gke-library-default-pool-35b5943a-ps5v  Created container
  Normal   Started    46s               kubelet, gke-library-default-pool-35b5943a-ps5v  Started container
  Normal   Pulling    2s (x4 over 85s)  kubelet, gke-library-default-pool-35b5943a-ps5v  pulling image "gcr.io/library-259506/library"
  Normal   Created    1s (x4 over 50s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Created container
  Normal   Started    1s (x4 over 50s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Started container
  Normal   Pulled     1s (x4 over 52s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Successfully pulled image "gcr.io/library-259506/library"
  Warning  BackOff    1s (x5 over 43s)  kubelet, gke-library-default-pool-35b5943a-ps5v  Back-off restarting failed container​

Here is the library.yaml file I have to go with it.

# [START kubernetes_deployment]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: library
  labels:
    app: library
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: library
    spec:
      containers:
      - name: library-app
        # Replace  with your project ID or use `make template`
        image: gcr.io/library-259506/library
        # This setting makes nodes pull the docker image every time before
        # starting the pod. This is useful when debugging, but should be turned
        # off in production.
        imagePullPolicy: Always
        env:
            # [START cloudsql_secrets]
            - name: DATABASE_USER
              valueFrom:
                secretKeyRef:
                  name: cloudsql
                  key: username
            - name: DATABASE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: cloudsql
                  key: password
            # [END cloudsql_secrets]
        ports:
        - containerPort: 8080

      # [START proxy_container]
      - image: gcr.io/cloudsql-docker/gce-proxy:1.16
        name: cloudsql-proxy
        command: ["/cloud_sql_proxy", "--dir=/cloudsql", 
                  "-instances=library-259506:asia-south1:library=tcp:3306",
                  "-credential_file=/secrets/cloudsql/credentials.json"]
        volumeMounts:
          - name: cloudsql-oauth-credentials
            mountPath: /secrets/cloudsql
            readOnly: true
          - name: ssl-certs
            mountPath: /etc/ssl/certs
          - name: cloudsql
            mountPath: /cloudsql
      # [END proxy_container] 
      # [START volumes]
      volumes:
        - name: cloudsql-oauth-credentials
          secret:
            secretName: cloudsql-oauth-credentials
        - name: ssl-certs
          hostPath:
            path: /etc/ssl/certs
        - name: cloudsql
          emptyDir:
      # [END volumes]        
# [END kubernetes_deployment]

---
    # [START service]
    # The library-svc service provides a load-balancing proxy over the polls app
    # pods. By specifying the type as a 'LoadBalancer', Container Engine will
    # create an external HTTP load balancer.
    # The service directs traffic to the deployment by matching the service's selector to the deployment's label
    #
    # For more information about external HTTP load balancing see:
    # https://cloud.google.com/container-engine/docs/load-balancer
    apiVersion: v1
    kind: Service
    metadata:
      name: library-svc
    spec:
      type: LoadBalancer
      ports:
      - port: 80
        targetPort: 8080
      selector:
        app: library

    # [END service]

More error status

Container 'library-app' keeps crashing.
CrashLoopBackOff
Reason  
Container 'library-app' keeps crashing.
Check Pod's logs to see more details. Learn more
Source  
library-7699b84747-9skst

Conditions  
Initialized: True Ready: False ContainersReady: False PodScheduled: True

 - lastProbeTime: null
    lastTransitionTime: "2019-12-06T06:03:43Z"
    message: 'containers with unready status: [library-app]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady

Key Events

Back-off restarting failed container BackOff Dec 6, 2019, 9:34:54 AM Dec 6, 2019, 12:24:26 PM 779 pulling image

"gcr.io/library-259506/library" Pulling Dec 6, 2019, 9:34:12 AM Dec 6, 2019, 11:59:26 AM 34

The Dockerfile is as follows (this fixed the CrashLoop btw):

FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install -r requirements.txt
COPY . /code/

# Server
EXPOSE 8080
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8080"]
-- Sudhakar R
django
google-kubernetes-engine
kubernetes

1 Answer

12/7/2019

I think a bunch of things all came together

  • I found the password to db had a special character that needed to be put within quotes and then ensuring port # where accurate across the Dockerfile, library.yaml files. This ensured the secrets actually worked, I detected in the logs a password mismatch issue.
  • IMPORTANT: the command line fix Emil G about ensuring my Dockerfile doesn't exit quickly, so make sure the CMD actually works and runs your server.
  • IMPORTANT: Finally I found a fix to the external IP not connecting to my server, see this thread where I explain what went wrong: basically I needed a security context where I had to fix the runAs to not run as root: RunAsUser issue & Clicking external IP of load balancer -> Bad Request (400) on deploying Django app on GKE (Kubernetes) and db connection failing:
  • I also documented all steps to deploy step 1-15 and
-- Sudhakar R
Source: StackOverflow