Question

I want to distribute the processing of large batches. The idea is to use Spring Batch fire up a bunch of AMQP consumers in a cloud and then to load cheap tasks (like item IDs) and submit them to an AMQP exchange. Writing of the results will be done by the consumers themselves.

Is there a ready-made library to accomplish this?

A few thoughts:

  • Spring Batch is totally negotiable.
  • Batch size is several millions. I don't want to kill my message broker by brute-force submitting all these IDs at once but use some kind of throttling instead.
  • I do want to know about what items have been processed so I can monitor the process. So the batch controlling process will have to receive replies from the consumers.
Was it helpful?

Solution

Yes, see the spring-batch-integration project. It combines Spring Batch and Spring Integration to perform what you want.

For batch 2.2.x, it's part of the spring-batch-admin distribution; in the upcoming batch 3.0.0 release it's been moved to batch proper.

Remote partitioning just sends metadata about the partitions and the workers actually fetch the data.

It comes with a JMS example but it wouldn't be hard to swap out the spring-integration JMS gateways for spring integration amqp gateways.

There's also a remote chunking version where the data is sent over the transport instead of partition metadata.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top