I have a kubernetes cluster on 1.18:
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
I am following the documentation for 1.18 cronjobs. I have the following yaml saved in hello_world.yaml:
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
I created the cronjob with
kubectl create -f hello_world.yaml
cronjob.batch/hello created
However the jobs are never scheduled, despite the cronjob being created:
kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 0 <none> 5m48s
kubectl get jobs
NAME COMPLETIONS DURATION AGE
not-my-job-1624413720 1/1 43m 7d11h
not-my-job-1624500120 1/1 42m 6d11h
not-my-job-1624586520 1/1 43m 5d11h
I notice that the last job to run did so 5 days ago, when our certificates expired resulting in developers getting the following error:
"Unable to connect to the server: x509: certificate has expired or is not yet valid"
We regenerated the certs using the following procedure from IBM, which seemed to work at the time. These are the main commands, we did some backups of configuration files etc. also as per the linked doc:
kubeadm alpha certs renew all
systemctl daemon-reload&&systemctl restart kubelet
I am sure the certificate expiration and renewal has caused some issue, but I see no smoking gun.
kubectl describe cronjob hello
Name: hello
Namespace: default
Labels: <none>
Annotations: <none>
Schedule: */1 * * * *
Concurrency Policy: Allow
Suspend: False
Successful Job History Limit: 3
Failed Job History Limit: 1
Starting Deadline Seconds: <unset>
Selector: <unset>
Parallelism: <unset>
Completions: <unset>
Pod Template:
Labels: <none>
Containers:
hello:
Image: busybox
Port: <none>
Host Port: <none>
Args:
/bin/sh
-c
date; echo Hello from the Kubernetes cluster
Environment: <none>
Mounts: <none>
Volumes: <none>
Last Schedule Time: <unset>
Active Jobs: <none>
Events: <none>
Any help would be greatly appreciated! Thanks.
EDIT: provide some more info:
sudo kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Jun 30, 2022 13:31 UTC 364d no
apiserver Jun 30, 2022 13:31 UTC 364d ca no
apiserver-etcd-client Jun 30, 2022 13:31 UTC 364d etcd-ca no
apiserver-kubelet-client Jun 30, 2022 13:31 UTC 364d ca no
controller-manager.conf Jun 30, 2022 13:31 UTC 364d no
etcd-healthcheck-client Jun 30, 2022 13:31 UTC 364d etcd-ca no
etcd-peer Jun 30, 2022 13:31 UTC 364d etcd-ca no
etcd-server Jun 30, 2022 13:31 UTC 364d etcd-ca no
front-proxy-client Jun 30, 2022 13:31 UTC 364d front-proxy-ca no
scheduler.conf Jun 30, 2022 13:31 UTC 364d no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Jun 23, 2030 13:21 UTC 8y no
etcd-ca Jun 23, 2030 13:21 UTC 8y no
front-proxy-ca Jun 23, 2030 13:21 UTC 8y no
ls -alt /etc/kubernetes/pki/
total 68
-rw-r--r-- 1 root root 1058 Jun 30 13:31 front-proxy-client.crt
-rw------- 1 root root 1679 Jun 30 13:31 front-proxy-client.key
-rw-r--r-- 1 root root 1099 Jun 30 13:31 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1090 Jun 30 13:31 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1229 Jun 30 13:31 apiserver.crt
-rw------- 1 root root 1679 Jun 30 13:31 apiserver.key
drwxr-xr-x 4 root root 4096 Sep 9 2020 ..
drwxr-xr-x 3 root root 4096 Jun 25 2020 .
-rw------- 1 root root 1675 Jun 25 2020 sa.key
-rw------- 1 root root 451 Jun 25 2020 sa.pub
drwxr-xr-x 2 root root 4096 Jun 25 2020 etcd
-rw-r--r-- 1 root root 1038 Jun 25 2020 front-proxy-ca.crt
-rw------- 1 root root 1675 Jun 25 2020 front-proxy-ca.key
-rw-r--r-- 1 root root 1025 Jun 25 2020 ca.crt
-rw------- 1 root root 1679 Jun 25 2020 ca.key
Found a solution to this one after trying a lot of different stuff, forgot to update at the time. The certs were renewed after they had already expired, I guess this stopped a synchronisation of the certs across the different components in the cluster and nothing could talk to the API.
This is a three node cluster. I cordoned the worker nodes, stopped the kubelet service on them, stopped docker containers + service, started new docker containers, started the kubelet, uncordoned the nodes and carried out the same procedure on the master node. This forced the synchronisation of certs and keys across the different components.