I have a Kubernetes v1.10.2 cluster and a cronjob on it. The job config is set to:
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 3
But it has created more than ten jobs, which are all successful and not removed automatically. Now I am trying to delete them manually, with kubectl delete job XXX
, but the command timeout as:
$ kubectl delete job XXX
error: timed out waiting for "XXX" to be synced
I want to know how can I check in such a situation. Is there a log file for the command execution?
I only know the kubectl logs
command, but it is not for such a situation.
"kubectl get" shows the job has already finished:
status:
active: 1
completionTime: 2018-08-27T21:20:21Z
conditions:
- lastProbeTime: 2018-08-27T21:20:21Z
lastTransitionTime: 2018-08-27T21:20:21Z
status: "True"
type: Complete
failed: 3
startTime: 2018-08-27T01:00:00Z
succeeded: 1
and "kubectl describe" output as:
$ kubectl describe job test-elk-xxx-1535331600 -ntest
Name: test-elk-xxx-1535331600
Namespace: test
Selector: controller-uid=863a14e3-a994-11e8-8bd7-fa163e23632f
Labels: controller-uid=863a14e3-a994-11e8-8bd7-fa163e23632f
job-name=test-elk-xxx-1535331600
Annotations: <none>
Controlled By: CronJob/test-elk-xxx
Parallelism: 0
Completions: 1
Start Time: Mon, 27 Aug 2018 01:00:00 +0000
Pods Statuses: 1 Running / 1 Succeeded / 3 Failed
Pod Template:
Labels: controller-uid=863a14e3-a994-11e8-8bd7-fa163e23632f
job-name=test-elk-xxx-1535331600
Containers:
xxx:
Image: test-elk-xxx:18.03-3
Port: <none>
Host Port: <none>
Args:
--config
/etc/elasticsearch-xxx/xxx.yml
/etc/elasticsearch-xxx/actions.yml
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 100m
memory: 100Mi
Environment: <none>
Mounts:
/etc/elasticsearch-xxx from xxx-configs (ro)
Volumes:
xxx-configs:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: test-elk-xxx
Optional: false
Events: <none>
It indicates still one pod running, but I don't know how to figure out the pod name.
I think this is the same as the problem reported in github:
Cannot delete jobs when their associated pods are gone
This is reported by several people, and it is not fixed still.
And can use the "-v=X" (e.g. -v=8) option for the kubectl command, it will give more detailed debug info.
As taken from https://github.com/kubernetes/kubernetes/issues/43168#issuecomment-375700293
Try using --cascade=false
in your delete job command.
It worked for me
Check if kubectl describe pod <pod name>
(associated pod of the job) still returns something, which would:
In that state, you can then consider a force deletion.