One microservice doing many transformations or many microservices doing one transformation?

1/14/2019

I have a service that transforms XML documents. It receives a messages from N queues and depending on from which queue it gets the message from it runs one of the transformations.

Now I'm refactoring it to be a microservice running in docker container. I think I can do it in two ways but I don't know which of them would be a better practice when using containers.

I'm doing this in .NET Core 2.2. I don't know yet whether we will use docker-swarm or kubernetes on production.

  1. I could just leave most of the code like it is now and run it as one container. AppSettings.config would contain settings for every queue-transformation pair.

    The pros I see: Less code changes needed and simpler docker-compose - one service instead of N. Only one config file.

    The cons: If there is a higher demand for one of transformations or if it has higher priority I can't just scale it's container. New replicas would be pulling messages from all queues instead of just one that I want to scale for.

  2. I could refactor the code and build N different container images. Each of them would listen to one queue and run one transformation. I can refactor it so that the only difference in code is in Startup when registering services. Based on configuration I could register one of IXmlTransformers.

    Pros: I can scale for one queue-transformation. If there is a problem with one of the transformations and container restarts all other transformations will work normally. I think it's cleaner - one image is doing one transformation - Single responsibility rule.

    Cons: In some environments there will be around 10 different transformations therefore There will be a lot of configuration files - probably added to each running container using a volume? Docker compose will be very long.

    What would be a better way to do it? Or maybe there are other better options?

Edit:

I think I wasn't clear with my question so I try to explain it better. In both options I would have one container image. The difference would be at runtime. Only configuration would be different. Let's say I'm transforming XML to CSV or XML to XML using XSLT. CSV format and XSLT are part of configuration. I can have many CSV formats (many configurations) and many different XSLTs (again many configurations) So I have two ITransformer implementations (CSV, XSLT) and they read configuration to run their transformations. And my question is whether it is better to have one running container instance for one config or put all those configs into one container which monitors N queues or monitors one queue and reads some kind of metadata to decide which transformation to run.

-- Piotr Perak
.net-core
docker
docker-swarm
kubernetes
microservices

2 Answers

1/14/2019

The cons: If there is a higher demand for one of transformations or if it has higher priority I can't just scale it's container. New replicas would be pulling messages from all queues instead of just one that I want to scale for.

Why is it an issue that all queues are being pulled? Is it because you usually have a large backlog in the queues or are you afraid of providing overcapacity? When all queues are up to date with processing only the one queue that you wanted to scale should be picked up anyway.

In case you can't do that I would still go with generic service that can pull from all queues, but try to incorporate a weighting or other rule set based on queue length, timestamps of queue entries or whatever is appropriate in your case to regulate what queue is pulled with what priority. If the right result parameter of this logic is exposed it can also be used to auto-scale your cluster if are are too behind.

The alternative of manual scaling based on queues sounds like it could turn into an admin nightmare.

-- Oswin Noetzelmann
Source: StackOverflow

1/14/2019

When you have the context of which job should run in the name of the queue or event, it shouldn't be a problem to use the same code in multiple containers also with the same config. I don't know your setup, but I would avoid binding one container to one queue. Containers shouldn't be deployed with state, when you can avoid it.

Depending on amount of files and money you have, you could also use an approach with amazon lambda and S3. No need for any message queueing or own infrastructure when you go with S3 and different bucket names, which are triggering a lambda function on upload. It's not a better approach it's just a different one and it hardly depends on your needs. It can be cheaper, or much more expensive than your approach, which also depends on your use case.

-- user934801
Source: StackOverflow