Monitor Cronjob running on GKE

9/16/2019

I'm trying to monitor a CronJob running on GKE and I cannot see an easy way of checking if the CronJob is actually running. I want to trigger an alert if the CronJob is not running for more than a X amount of time and Stackdriver does not seem to support that.

At the moment I tried using alerts based on logging metrics but that only serves me to alert in case of an app crash or specific errors not for the platform errors themselves.

I investigated a solution using Prometheus alerts, can that be integrated into Stackdriver?

-- Ricardo Gomes
cloud
cron
google-cloud-platform
kubernetes
monitoring

1 Answer

9/16/2019

Seeing as though its a cronjob, that starts standard Kubernetes Jobs, you could query for the job and then check it's start time, and compare that to the current time.

Note: I'm not familiar with stackdriver, so this may not be what you want, but...

E.g. with bash:

START_TIME=$(kubectl -n=your-namespace get job your-job-name -o json | jq '.status.startTime')
echo $START_TIME

You can also get the current status of the job as a JSON blob like this:

kubectl -n=your-namespace get job your-job-name -o json | jq '.status'

This would give a result like:

{
  "completionTime": "2019-09-06T17:13:51Z",
  "conditions": [
    {
      "lastProbeTime": "2019-09-06T17:13:51Z",
      "lastTransitionTime": "2019-09-06T17:13:51Z",
      "status": "True",
      "type": "Complete"
    }
  ],
  "startTime": "2019-09-06T17:13:49Z",
  "succeeded": 1
}

You can use a tool like jq in your checking script to look at the succeeded or type fields to see if the job was successful or not.

So with your START_TIME value you could get the current time or the job completion time (completionTime) and if the result is less than your minimum job time threshold you can then trigger your alert - e.g. POST to a slack webhook to send a notification or whatever other alert system you use.

-- Shogan
Source: StackOverflow