Upgrade strategy for Pub/Sub subscription application

4/16/2021

I’m curious how to approach an upgrade/reboot situation with an application consuming messages from a Google Pub/Sub.

For example I am particularly interested in developing a Golang application that is deployed in Kubernetes running multiple pods and consumes messages form a Google Pub/Sub. My concern is how can I ensure there are no messages missed (or processed twice) while a pod is being upgraded.

I understand the application would read the message from the subscription and then must acknowledge it has received it. I feel there may be a race condition between acknowledging the message and the pod shutting down to upgrade?

I know doing something similar is possible with a Dataflow job as you can stop a streaming job and signal it to drain the messages.

I assume there has to be someway to handle this gracefully, or is this really a situation where Dataflow is better suited?

-- walshbm15
dataflow
go
google-cloud-dataflow
google-cloud-pubsub
kubernetes

1 Answer

4/16/2021

Kubernetes uses SIGTERM, waits 30 seconds, then SIGKILL. This gives your application an appropriate amount of time before completely killing it, which you can tweak using the terminationGracePeriodSeconds: 60 field if the 30 second default is not enough (link 1).

You then need to add logic in your Golang to receive the SIGTERM signal (link 2).

Lastly, assuming your queue is rabbit here (but other queue's will have similar functioanlity), on receipt of a SIGTERM, you can write logic to A) Stop receiving new messages and then B) (this is optional, you could just let them finish) return an NACK and Requeue signal for all messages that the pod currently has ack'd but not finished, putting the messages back (link 3 and 4).

If you can avoid having to implement NACK/Requeue, and just handle SIGTERM by closing your queue listener and finishing the remainder of currently held messages (and say 30 or 60 seconds is sufficient to do so), that is much simpler and recommended.

  1. https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace

  2. https://stackoverflow.com/questions/18106749/golang-catch-signals

  3. https://stackoverflow.com/questions/28794123/ack-or-nack-in-rabbitmq/28817796

  4. https://www.rabbitmq.com/nack.html

Edit

For google cloud pub/sub, you can also send an Nack.

https://pkg.go.dev/cloud.google.com/go/pubsub#Message

"Ack indicates successful processing of a Message. If message acknowledgement fails, the Message will be redelivered. Nack indicates that the client will not or cannot process a Message. Nack will result in the Message being redelivered more quickly than if it were allowed to expire."

-- Olivercodes
Source: StackOverflow