Why i get error 137 when exec lifecycle hook in Kubernetes CronJob

8/21/2019

I have this spec for Kubernetes CronJob

---
kind: CronJob
apiVersion: batch/v1beta1
metadata:
  name: do-registry-cleanup

spec:
  schedule: "* * * * *"
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 4
  jobTemplate:
    spec:
      template:
        spec:
          automountServiceAccountToken: false
          restartPolicy: OnFailure
          containers:
          - name: podtest2
            image: alpine
            args:
            - wget
            - http://some_real_url/test/pod/2
            imagePullPolicy: Always
            lifecycle:
              postStart:
                exec:
                  command:
                  - "sh"
                  - "-c"
                  - "sleep 2s;"

When i do kubectl describe pod some_pod_name i get this output (truncated)

Normal   Pulling              105s  kubelet, general-rl8c  pulling image "alpine"
Normal   Pulled               105s  kubelet, general-rl8c  Successfully pulled image "alpine"
Normal   Created              105s  kubelet, general-rl8c  Created container
Normal   Started              104s  kubelet, general-rl8c  Started container
Warning  FailedPostStartHook  104s  kubelet, general-rl8c  Exec lifecycle hook ([sh -c sleep 2s;]) for Container "podtest2" in Pod "do-registry-cleanup-1566391980-dvjdn_default(9d87fe8a-c412-11e9-8744-d2e7c0045fbd)" failed - error: command 'sh -c sleep 2s;' exited with 137: , message: ""
Normal   Killing              104s  kubelet, general-rl8c  Killing container with id docker://podtest2:FailedPostStartHook

As result in this example wget is request url, and i know that sleep command is executed,not broken. My Question is why are:

  1. why is it occured?
  2. what are side effects of this?

Some additional info. If command is "cmd1; sleep; cmd2" then cmd2 not executed. So by some reason sleep cmd invoke error in container.

-- JOHN_16
kubernetes

2 Answers

8/21/2019

Try command: ["/bin/sh", "-c", "sleep 2s"]

-- Keilo
Source: StackOverflow

8/21/2019

Refering to the official documentation:

Pod lifecycle:

Once a container enters into Running state, postStart hook (if any) is executed.

A container enters into Terminated state when it has successfully completed execution or when it has failed for some reason. Regardless, a reason and exit code is displayed, as well as the container’s start and finish time. Before a container enters into Terminated, preStop hook (if any) is executed.

Lifecycle hooks:

There are two hooks that are exposed to Containers:

PostStart

This hook executes immediately after a container is created. However, there is no guarantee that the hook will execute before the container ENTRYPOINT. No parameters are passed to the handler.

PreStop

This hook is called immediately before a container is terminated due to an API request or management event such as liveness probe failure, preemption, resource contention and others. A call to the preStop hook fails if the container is already in terminated or completed state. It is blocking, meaning it is synchronous, so it must complete before the call to delete the container can be sent. No parameters are passed to the handler

Actually, what is written for PreStop works for PostStart also.

Basically, Kubelet doesn't wait until all hooks are finished. It just terminate everything after main container exits.

For PreStop we can only increase grace period, but for PostStart we can make the main contaner waiting until the hook is finished. Here is an example:

kind: CronJob
apiVersion: batch/v1beta1
metadata:
  name: test1
spec:
  schedule: "* * * * *"
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 4
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: test1
            image: nginx
            command: ["bash", "-c", "touch file1; while [ ! -f file2 ] ; do ls file*; sleep 1 ; done; ls file*"]
            lifecycle:
              postStart:
                exec:
                  command: ["bash", "-c", "sleep 10; touch file2"]

If you check the logs of the pod, you'll see that hook has created file before the main container has been terminated. You can see that cycle has run 12 times, instead of 10. That means that PostStart has been started after 2 second after main container starts running. It means, that the container enters into Running state with some delay after start.

$ kubectl describe cronjob/test1 | grep Created
  Normal  SuccessfulCreate  110s  cronjob-controller  Created job test1-1566402420
$ kubectl describe job/test1-1566402420 | grep Created
  Normal  SuccessfulCreate  2m28s  job-controller  Created pod: test1-1566402420-d5lfr
$ kubectl logs pod/test1-1566402420-d5lfr -c test1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file1
file2
-- VAS
Source: StackOverflow