Why do I get Crashloopback while running Tesorflow in Kubernetes?

4/29/2021

Here is my pod.yaml configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorflow
  labels:
    app: tensorflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tensorflow
  template:
    metadata:
      labels:
        app: tensorflow
    spec:
      containers:
      - name: tensorflow
        image: tensorflow/tensorflow:latest
        ports:
        - containerPort: 8888

I get crashloop error when I tried to create it can anyone help with this? LIke, am I doing anything wrong?

-- Ramu
kubernetes
tensorflow

2 Answers

4/30/2021

If you will check result of kubectl describe pod tensorflow-**********-*****, you will see that last state was Terminated with Exit Code 0, that literally means you container was launched successfully, did it job and finished also successfully.

IN addition, there is a restartPolicy: Always turned on by default for deployments and you cant set restartPolicy: Never. More info here: deployments do not support (honor) container restartPolicy

Always means that the container will be restarted even if it exited with a zero exit code (i.e. successfully) - thats why you see restarts and CrashLoopBackOff's.

State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 29 Apr 2021 23:44:03 +0000
      Finished:     Thu, 29 Apr 2021 23:44:03 +0000
    Ready:          False
    Restart Count:  2

You can add any type of infinite loop in deployment to let NOT your tensorflow pod finish, e.g

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorflow
  labels:
    app: tensorflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tensorflow
  template:
    metadata:
      labels:
        app: tensorflow
    spec:
      containers:
      - name: tensorflow
        image: tensorflow/tensorflow:latest
        ports:
        - containerPort: 8888
        command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]

result:

kubectl get pod tensorflow-788846c588-p64rl
NAME                          READY   STATUS    RESTARTS   AGE
tensorflow-788846c588-p64rl   1/1     Running   0          4m23s
-- Vit
Source: StackOverflow

4/29/2021

If the pod is in crash loop, it means that the it is constantly starting and dying. This means that you config is valid from k8s perspective.

Quite hard to tell without having any logs. Can you run kubectl describe <pod-name> and kubectl logs <pod-name>

As you run Tensorflow - does you container need GPU support?

-- antaxify
Source: StackOverflow