How to debug crash-looping pods in OpenShift?

2/24/2020

I have a simple DeploymentConfig:

apiVersion: v1
kind: DeploymentConfig
metadata:
  name: my-dc
  labels:
    app: my
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: my
    spec:
      containers:
        - name: my
          image: image-registry.openshift-image-registry.svc:5000/pc/rhel-atomic
          livenessProbe:
            exec:
              command:
                - echo
                - "I'm alive!"
              initialDelaySeconds: 10
          readinessProbe:
            exec:
              command:
                - echo
                - "I'm healthy!"
              initialDelaySeconds: 15
              periodSeconds: 15

The image-registry.openshift-image-registry.svc:5000/pc/rhel-atomic image stream points to my own image that is simply:

FROM registry.access.redhat.com/rhel7/rhel-atomic

When I do oc create -f my-dc.yaml and try to check what is going on, I see that my pods are crash-looping.

To debug it, I did a oc status --suggest. It suggests listing the container logs with oc logs my-dc-1-z889c -c my. However, there is no logs for any of the containers.

My oc get events does not help either. It just cycles through these messages:

<unknown>   Normal    Scheduled                     pod/my-dc-1-vnhmp                     Successfully assigned pc/my-dc-1-vnhmp to ip-10-0-128-37.ec2.internal
31m         Normal    Pulling                       pod/my-dc-1-vnhmp                     Pulling image "image-registry.openshift-image-registry.svc:5000/pc/rhel-atomic"
31m         Normal    Pulled                        pod/my-dc-1-vnhmp                     Successfully pulled image "image-registry.openshift-image-registry.svc:5000/pc/rhel-atomic"
31m         Normal    Created                       pod/my-dc-1-vnhmp                     Created container my
31m         Normal    Started                       pod/my-dc-1-vnhmp                     Started container my
27m         Warning   BackOff                       pod/my-dc-1-vnhmp                     Back-off restarting failed container
<unknown>   Normal    Scheduled                     pod/my-dc-1-z8jgb                     Successfully assigned pc/my-dc-1-z8jgb to ip-10-0-169-70.ec2.internal
31m         Normal    Pulling                       pod/my-dc-1-z8jgb                     Pulling image "image-registry.openshift-image-registry.svc:5000/pc/rhel-atomic"
31m         Normal    Pulled                        pod/my-dc-1-z8jgb                     Successfully pulled image "image-registry.openshift-image-registry.svc:5000/pc/rhel-atomic"
31m         Normal    Created                       pod/my-dc-1-z8jgb                     Created container my
31m         Normal    Started                       pod/my-dc-1-z8jgb                     Started container my
27m         Warning   BackOff                       pod/my-dc-1-z8jgb                     Back-off restarting failed container

How do I debug this? Why the containers crash?

I am using OpenShift Online.

-- foki
kubernetes
openshift

1 Answer

2/24/2020

It seems that the single container in the pod doesn't have any process running. So, the container is terminated right after it is started.

An option to keep the container running is to add these to the DeploymentConfig for the container:

        command:
        - /bin/sh
        stdin: true

Replace /bin/sh with a different shell (e.g. bash) or from a different location based on what's available in the image.

-- gears
Source: StackOverflow