Axon Server auto-scaling split/merge delay

6/10/2020

I am implementing auto-scaling in an application using Axon Server, and running in k8s.

I have created ReST endpoints in the application itself, which look at the local configuration (for processors and thread counts) and then speak to the Axon Server ReST API in order to split/merge the processors appropriately. The intent being to use container lifecycle hooks to trigger them.

As a result, if a new instance (pod) of an application is launched, configured for 2 threads on ProcessorA, then my code will make 2 requests to the /v1/components/blah/processors/ProcessorA/segments/split?context=default endpoint on the server. This is in order to make full use of the 2 new threads.

Likewise, when the pod is shut down, it makes 2 similar requests to the merge endpoint on the server.

When scaling up I see the processor split twice, as expected. However, on shutdown I don't see the merge twice unless I put a long (5s) wait between requests. This isn't likely to be particularly stable, so I'm wondering if there's something else I need to be doing.

Perhaps I ought to request the merge, then loop waiting for it to occur, then request another. This seems like it's going to be excessively slow.

There was another question on SO somewhat related, https://stackoverflow.com/questions/55121228/automatically-scale-axons-tracking-event-processors, where Steven commented that there was no inbuilt auto-scaling in Axon Server at that point in time. I've not seen anything in more recent times either.

-- ptomli
axon
kubernetes

1 Answer

6/11/2020

As it stands work is underway to improve the split/merge functionality. For one, the result of a split/merge will be returned, which has been resolved under issue #1001.

This should make it so you do not have to wait for the status' to have been updated, which is the likely cause why it (seems to) take long. This functionality will be part of Axon Framework / Server 4.4 by the way, which should be released relatively soon.

Subsequently, discussion are still underway to allow for auto scaling. One requirement deemed important is the capability of a TrackingEventProcessor to process several segments per thread (issue #1434). This will ensure that the TEP can take over several segments to transition the boundary when scaling, for example.

Eventually though, Axon Server should be able to do this for you. It's just not there yet.

So for now I think the most pragmatic solution is indeed to wait for the result to show up on the status'. As said, I trust 4.4 will improve upon this by returning the result of the split/merge operation once called. Lastly, the Axon team is aware this can be improved upon further, hence why discussion on the matter are underway.

-- Steven
Source: StackOverflow