K8s - Node alerts

8/10/2019

How can I configure GCP to send me alerts when nodes events (create / shutdown) happen? I would like to receive email alerting me about the cluster scaling.

tks

-- Marcos Vidolin
alert
google-cloud-platform
kubernetes

2 Answers

8/15/2019

for a faster way than using GCP sinks, you may also consider using internal Kubernetes nodes watchers.

You can see an example in https://github.com/notify17/k8s-node-watcher-example/blob/5fc3f802de69f65866cc8f37c4b0e721835ea5b9/main.go#L83.

This example uses Notify17 to generate notifications directly to you browser or mobile phone.

The relevant code is:

// Sets up the nodes watcher
watcher, err := api.Nodes().Watch(listOptions)
// ...
ch := watcher.ResultChan()

for event := range ch {
    node, ok := event.Object.(*v1.Node)
    // ...

    switch event.Type {
    case watch.Added:
        // ... 
        // Triggers a Notify17 notification for the ADDED event
        notify17(httpClient, 
            "Node added", fmt.Sprintf("Node %s has been added", node.Name))
    case watch.Deleted:
        // ... 
        // Triggers a Notify17 notification for the DELETED event
        notify17(httpClient, 
            "Node deleted", fmt.Sprintf("Node %s has been deleted", node.Name))
    }
// ...

You can test out this approach by following the instructions provided in the README.

Note: the drawback with this method is that, if the node where the pod lies on gets deleted/killed unsafely, there may be a chance the event will not be triggered for that node. If the node is deleted gracefully instead, like in the case of a cluster autoscaler, then the pod will be probably recreated on a new node before the old node gets deleted, therefore triggering the notification.

-- cmaster11
Source: StackOverflow

8/10/2019

First, note that you can retrieve such events in Stackdriver Logging by using the following filter :

logName="projects/[PROJECT_NAME]/logs/cloudaudit.googleapis.com%2Factivity" AND
(
    protoPayload.methodName="io.k8s.core.v1.nodes.create" OR
    protoPayload.methodName="io.k8s.core.v1.nodes.delete"
)

This filter will retrieve only audit activity log entries (cloudaudit.googleapis.com%2Factivity) in your project [PROJECT_NAME], corresponding to a node creation event (io.k8s.core.v1.nodes.create) or deletion (io.k8s.core.v1.nodes.delete).

To be alerted when such a log is generated, there are multiple possibilities.

You could configure a sink to a Pub/Sub topic based on this filter, and then trigger a Cloud Function when a filtered log entry is created. This Cloud Function will define the logic to send you a mail. This is probably the solution I'd choose, since this use case is described in the documentation.

Otherwise, you could define a logs-based metric based on this filter (or one logs-based metric for creation and another for deletion), and configure an alert in Stackdriver Monitoring when this log-based metric is increased. This alert could be configured to send an email. However, I won't suggest you to implement this, because this is not a real "alert" (in the sense of "something went wrong"), but rather an information. You probably don't want to have incidents opened in Stackdriver Monitoring every time a node is created or deleted. But you can keep the idea of one/multiple logs-based metric and process it/them with a custom application.

-- norbjd
Source: StackOverflow