Using the Python Client for Kubernetes, I've created a small service to watch for new Pods and send the data to an external service, for metrics gathering. I find that it works completely, but after a few days the Watch seems to stop receiving new changes. It doesn't report any errors or throw any exceptions; it just acts as if there are no more changes. I can see new changes coming in if I start a new watch, and the process resumes if I restart the container, but it seems as though I can't have one process running continuously.
I'm running on GKE, and I wonder if maybe the Kubernetes API endpoint becomes unavailable. But all I want is to resume once it's available again. I'd be happy with the pod crashing and having to restart in this case, but I get no report from the Watch at all so there's no situation I can attempt to handle.
Here's the relevant parts of my code:
def main():
log = app.logger.get()
kube_api = get_kubernetes_config()
resource_version = get_resource_version(kube_api)
watch_params = {
'resource_version': resource_version
}
log.debug(f'Watching from resource version {resource_version}')
w = watch.Watch()
stream = w.stream(kube_api.list_pod_for_all_namespaces, **watch_params)
log.info('Started watching for new pods')
for message in stream:
process_pod_change(message['object'], log)
def process_pod_change(pod, log):
if not pod.metadata.deletion_timestamp is None or pod.status.container_statuses is None or not all(status.ready for status in pod.status.container_statuses):
return
pod_name = f'{pod.metadata.namespace}/{pod.metadata.name}'
for status in pod.status.container_statuses:
docker_image_sha = status.image_id.split('@')[-1]
report_deployment(docker_image_sha, pod_name, status.name, log)
with open(RESOURCE_VERSION_FILE, 'w') as f:
f.write(str(pod.metadata.resource_version))
def report_deployment(sha, pod_name, container_name, log):
log.info(f'Seen new deployment of {pod_name} container {container_name}: {sha}')
authorised_session = app.auth.get_authorised_session()
jsonbody = {
'artefact_type': 'docker',
'artefact_id': sha,
'client': os.environ['CLIENT'],
'environment': os.environ['ENVIRONMENT'],
'product': os.environ['PRODUCT']
}
r = authorised_session.post(os.environ['NOTIFICATION_URL'], json=jsonbody)
r.raise_for_status()
The resulting logs show a continuous stream of processed messages until at some point they stop coming in. There's no indication of anything odd happening in the logs. I also believe this is related to the Kubernetes Watch and not any downstream processing I'm doing because this is the second application I've written that has exhibited this behaviour of a Watch seemingly falling asleep and doing nothing.
Am I using this right? I can't find many examples online and no-one else seems to have this problem, so I don't see any workarounds.
My cluster version is 1.14.10-gke.27, I'm using the Python 3.6-alpine container and my Python dependencies are only from the past couple of weeks. But I also saw the same problem over six months ago in another attempt to use Watch.