Kubernetes CronJob Not exited

6/10/2021

I am running a cronjob in kubernetes. Cronjob started and but not exited. Status of pod is always in RUNNING. Below is logs

kubectl get pods
cronjob-1623253800-xnwwx   1/1     Running            0          13h

When i describe the JOB below are noticed

kubectl describe job cronjob-1623300120

Name:           cronjob-1623300120
Namespace:      cronjob
Selector:      xxxxx 
Labels:         xxxxx
Annotations:    <none>
Controlled By:  CronJob/cronjob
Parallelism:    1
Completions:    1
Start Time:     Thu, 9 Jun 2021 10:12:03 +0530
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=cronjob
           controller-xxxx
           job-name=cronjob-1623300120
  Containers:
   plannercronjob:
    Image:      xxxxxxxxxxxxx
    Port:       <none>
    Host Port:  <none>
    Mounts:                             <none>
  Volumes:                              <none>
Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  SuccessfulCreate  13h  job-controller  Created pod: cronjob-1623300120

I Noticed that Pods Statuses: 1 Running / 0 Succeeded / 0 Failed. This means that the when code return zero , then job Succeeded/Failed. Is that correct ?.

When i enter into the pod using execute command

kubectl exec --stdin --tty cronjob-1623253800-xnwwx -n cronjob -- /bin/bash

root@cronjob-1623253800-xnwwx:/# ps ax| grep python
    1 ?        Ssl    0:01 python -m sfit.src.app
   18 pts/0    S+     0:00 grep python

I found that python process is still running. Is this a code issue deadlock or something else.

pod describe
Name:         cronjob-1623302220-xnwwx
Namespace:    default
Priority:     0
Node:         aks-agentpool-xxxxvmss000000/10.240.0.4
Start Time:   Thu, 9 Jun 2021 10:47:02 +0530
Labels:       app=cronjob
              controller-uid=xxxxxx
              job-name=cronjob-1623302220
Annotations:  <none>
Status:       Running
IP:           10.244.1.30
IPs:
  IP:           10.244.1.30
Controlled By:  Job/cronjob-1623302220
Containers:
  plannercronjob:
    Container ID:   docker://xxxxxxxxxxxxxxxx
    Image: xxxxxxxxxxx
    Image ID:       docker-xxxx
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 9 Jun 2021 10:47:06 +0530
    Ready:          True
    Restart Count:  0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-97xzv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-97xzv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-97xzv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From                                        Message
  ----    ------     ----  ----                                        -------
  Normal  Scheduled  13h   default-scheduler                           Successfully assigned cronjob/cronjob-1623302220-xnwwx to aks-agentpool-xxx-vmss000000
  Normal  Pulling    13h   kubelet, aks-agentpool-xxx-vmss000000  Pulling image "xxxx.azurecr.io/xxx:1.1.1"
  Normal  Pulled     13h   kubelet, aks-agentpool-xxx-vmss000000  Successfully pulled image "xxx.azurecr.io/xx:1.1.1"
  Normal  Created    13h   kubelet, aks-agentpool-xxx-vmss000000  Created container cronjob
  Normal  Started    13h   kubelet, aks-agentpool-xxx-vmss000000  Started container cronjob

@KrishnaChaurasia . I run the docker image in my system. There is some error in my python code. But it is exit with error. But in the kubernetes it is not exited and not stop

docker run xxxxx/cronjob:1    
 File "/usr/local/lib/python3.8/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 261, in send
        raise error
    azure.core.exceptions.ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x7f113f6480a0>: Failed to establish a new connection: [Errno -2] Name or service not known

echo $? 1

-- galiylama
kubernetes
kubernetes-cronjob
kubernetes-pod

1 Answer

12/30/2021

If you are seeing your pod is always running and never completed, try to add staratingDeadlineSeconds.

https://medium.com/@hengfeng/what-does-kubernetes-cronjobs-startingdeadlineseconds-exactly-mean-cc2117f9795f

-- Patrick Ding
Source: StackOverflow