Partitioning in Kubernetes?

9/28/2017

I'd like to implement an application for transformation of some data in Kubernetes. The application contains a chain of logical operators.

A --- records R1 ---> B --- records R2 ---> C

  • The operator "A" generates records "R1" and passes them to the operator "B" (it's not important how the operator generates the records).
  • The operator "B" transforms the input records "R1" to the output records"R2" and passes them to the operator "C" in the push mode.
  • The operator "C" processes the records "R2" (it's not important how the operator works).

If my transformation is simple I can create the containers "A", "B", and "C" for the operators and put the containers into a pod. I can use the pod as a logical unit to start and stop my application.

But if the cost of my transformation is high I need to scale my application. I'd like to increase the number of instances of the operator "B" and run the transformations in parallel. I'd like to distribute the instances of the operator "B" to several Kubernetes nodes and support the failover for them.

Also I'd like to have a good interface in order to start and stop my application as a regular service.

Can I implement such kind of the application in Kubernetes?

-- Sergey Golovko
kubernetes

1 Answer

9/28/2017

That really depends on how you pass data between services. Are they pushing data up or pulling? Or do they store to some persistent storage? How many requests/records are passing the system and what is their size?

All in all, IMO you should implement each service (A/B/C) as a separate Pod/Deployment. Then you can have services defined for them to be able to call and pull or push data if it's an API based flow. On the other hand you might implement some kind of queue (ie. Kafka or RabbitMQ) and pass messages or just a database where you'd store the records in appropriate tables.

Hard to give a more precise answer without better understanding of the final objective.

-- Radek 'Goblin' Pieczonka
Source: StackOverflow