correct way to use kubernetes watches

10/9/2018

I am new to Kubernetes and I am not really sure on how to proceed to implement correctly a watch; especially I am not sure on how to deal with the resourceVersion parameter.

The goal is to watch for new pods with a specific label, and in case of error or disconnection from the cluster being able to restart the watch from the last event occurred.

I am doing something like this:

// after setting up the connection and some parameters
String lastResourceVersion = null; // at beginning version is unknown
while (true) {
  try {
    Watch<V1Pod> watcher = Watch.createWatch(
            client,
            api.listNamespacedPodCall(namespace, pretty, fieldSelector, labelSelector, lastResourceVersion, forEver, true, null, null),
            new TypeToken<Watch.Response<V1Pod>>() {}.getType()
    );
    for (Watch.Response<V1Pod> item : watcher) {
      //increment the version
      lastResourceVersion = item.object.getMetadata().getResourceVersion();
      // do some stuff with the pod
    }
  } catch (ApiException apiException) {
    log.error("restarting the watch from "+lastResourceVersion, apiException);
  }
}

Is it correct to use the resourceVersion of a Pod to reinitialize the watch call? Is this number a kind of timestamp for all the events in the cluster, or different api will use different sequences?

Do I need to watch for specific exceptions? eg. in case of the resourceVersion is to old?

thanks

-- G. Bricconi
kubernetes
watch

1 Answer

11/13/2018

Adam is right.

This is best explained by https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes

Quoting relevant parts (emphasis mine):

When retrieving a collection of resources (either namespace or cluster scoped), the response from the server will contain a resourceVersion value that can be used to initiate a watch against the server.

... snip ...

When the requested watch operations fail because the historical version of that resource is not available, clients must handle the case by recognizing the status code 410 Gone, clearing their local cache, performing a list operation, and starting the watch from the resourceVersion returned by that new list operation.

So before you call watch, you should list and pull the resourceVersion from the list (not the objects inside of it). Then start the watch with that resourceVersion. If the watch fails for some reason, you will have to list again and then use the resourceVersion from that list to re-establish the watch.

-- krousey
Source: StackOverflow