I have a cloud distributed database (AWS RDS - PostGres) with a table of sources
. Sources can be a web page or a social media account.
I have a cron job on my service that will go through each source
and get updated information like comments or stats.
Sometimes if specific conditions are met, another action can be triggered, i.e. if an instagram post hits 1000 likes, comment with a string, or if a blog creates a new post, send an email out to subscribers.
I would like to scale my service horizontally through docker and k8s, if I scale to two services, there will be two cron jobs, and any specific action could be sent twice. I do not want n
emails to be sent for n
instances I've scaled
What is the correct architecture to handle this?
If you want to horizontally scale the whole stack, split your domain by some reasonable key (say creation date) into N partitions, and have each partition be a full stack.
If you are concerned with scaleability, then you probably want to separate your stack into multiple layers (source refresher workers, action handlers, etc), connected by work queues so that any particular action can be scaled independently... But I'd start with a straight domain partition at first.