Kubernetes cronjob never scheduling, no errors

6/30/2021

I have a kubernetes cluster on 1.18:

Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

I am following the documentation for 1.18 cronjobs. I have the following yaml saved in hello_world.yaml:

kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

I created the cronjob with

kubectl create -f hello_world.yaml
cronjob.batch/hello created

However the jobs are never scheduled, despite the cronjob being created:

kubectl get cronjobs
NAME                               SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
hello                              */1 * * * *   False     0        <none>          5m48s



kubectl get jobs
NAME                                       COMPLETIONS   DURATION   AGE
not-my-job-1624413720                      1/1           43m        7d11h
not-my-job-1624500120                      1/1           42m        6d11h
not-my-job-1624586520                      1/1           43m        5d11h

I notice that the last job to run did so 5 days ago, when our certificates expired resulting in developers getting the following error:

"Unable to connect to the server: x509: certificate has expired or is not yet valid"

We regenerated the certs using the following procedure from IBM, which seemed to work at the time. These are the main commands, we did some backups of configuration files etc. also as per the linked doc:

kubeadm alpha certs renew all

systemctl daemon-reload&&systemctl restart kubelet

I am sure the certificate expiration and renewal has caused some issue, but I see no smoking gun.

kubectl describe cronjob hello
Name:                          hello
Namespace:                     default
Labels:                        <none>
Annotations:                   <none>
Schedule:                      */1 * * * *
Concurrency Policy:            Allow
Suspend:                       False
Successful Job History Limit:  3
Failed Job History Limit:      1
Starting Deadline Seconds:     <unset>
Selector:                      <unset>
Parallelism:                   <unset>
Completions:                   <unset>
Pod Template:
  Labels:  <none>
  Containers:
   hello:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Args:
      /bin/sh
      -c
      date; echo Hello from the Kubernetes cluster
    Environment:     <none>
    Mounts:          <none>
  Volumes:           <none>
Last Schedule Time:  <unset>
Active Jobs:         <none>
Events:              <none>

Any help would be greatly appreciated! Thanks.

EDIT: provide some more info:

sudo kubeadm alpha certs check-expiration

[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Jun 30, 2022 13:31 UTC   364d                                    no
apiserver                  Jun 30, 2022 13:31 UTC   364d            ca                      no
apiserver-etcd-client      Jun 30, 2022 13:31 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Jun 30, 2022 13:31 UTC   364d            ca                      no
controller-manager.conf    Jun 30, 2022 13:31 UTC   364d                                    no
etcd-healthcheck-client    Jun 30, 2022 13:31 UTC   364d            etcd-ca                 no
etcd-peer                  Jun 30, 2022 13:31 UTC   364d            etcd-ca                 no
etcd-server                Jun 30, 2022 13:31 UTC   364d            etcd-ca                 no
front-proxy-client         Jun 30, 2022 13:31 UTC   364d            front-proxy-ca          no
scheduler.conf             Jun 30, 2022 13:31 UTC   364d                                    no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jun 23, 2030 13:21 UTC   8y              no
etcd-ca                 Jun 23, 2030 13:21 UTC   8y              no
front-proxy-ca          Jun 23, 2030 13:21 UTC   8y              no
ls -alt /etc/kubernetes/pki/
total 68
-rw-r--r-- 1 root root 1058 Jun 30 13:31 front-proxy-client.crt
-rw------- 1 root root 1679 Jun 30 13:31 front-proxy-client.key
-rw-r--r-- 1 root root 1099 Jun 30 13:31 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1090 Jun 30 13:31 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1229 Jun 30 13:31 apiserver.crt
-rw------- 1 root root 1679 Jun 30 13:31 apiserver.key
drwxr-xr-x 4 root root 4096 Sep  9  2020 ..
drwxr-xr-x 3 root root 4096 Jun 25  2020 .
-rw------- 1 root root 1675 Jun 25  2020 sa.key
-rw------- 1 root root  451 Jun 25  2020 sa.pub
drwxr-xr-x 2 root root 4096 Jun 25  2020 etcd
-rw-r--r-- 1 root root 1038 Jun 25  2020 front-proxy-ca.crt
-rw------- 1 root root 1675 Jun 25  2020 front-proxy-ca.key
-rw-r--r-- 1 root root 1025 Jun 25  2020 ca.crt
-rw------- 1 root root 1679 Jun 25  2020 ca.key
-- Dobhaweim
kubectl
kubelet
kubernetes

1 Answer

7/14/2021

Found a solution to this one after trying a lot of different stuff, forgot to update at the time. The certs were renewed after they had already expired, I guess this stopped a synchronisation of the certs across the different components in the cluster and nothing could talk to the API.

This is a three node cluster. I cordoned the worker nodes, stopped the kubelet service on them, stopped docker containers + service, started new docker containers, started the kubelet, uncordoned the nodes and carried out the same procedure on the master node. This forced the synchronisation of certs and keys across the different components.

-- Dobhaweim
Source: StackOverflow