In an EKS Kubernetes cluster, I have a cronjob that creates pods in every 5 minutes. The cronjob always works well but it sometimes shows FailedCreatePodSandBox
event that I cannot understand its reason. Even though this event happens, cronjob works without any problem. Event logs are as follows,
39m Warning FailedCreatePodSandBox Pod Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "synthetic-test-cronjob-1565344380-bg97c": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"read init-p: connection reset by peer\"": unknown
34m Warning FailedCreatePodSandBox Pod Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "synthetic-test-cronjob-1565344680-xq9rl": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"read init-p: connection reset by peer\"": unknown
24m Warning FailedCreatePodSandBox Pod Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "synthetic-test-cronjob-1565345280-v5pz9": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"\"": unknown
9m39s Warning FailedCreatePodSandBox Pod Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "synthetic-test-cronjob-1565346180-xxpmc": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"signal: killed\"": unknown
As you can see, two different line numbers appear in the error message. process_linux.go:402
and process_linux.go:301
What are the possible reasons for this warning and how can I prevent it or should I just ignore it since it doesn't affect the cronjob?
It looks like there are some known issues related to the error messages provided in your example. Take a look at the following github issues:
https://github.com/kubernetes/kubernetes/issues/68190
https://github.com/opencontainers/runc/issues/1914
We believe that this error may also occur when exceeding any container cgroup limit (e.g. memory, cpu, pids).