Master-Slave Pattern for Distributed Environment

https://stackoverflow.com/questions/1762402

21-09-2019
|

Question

Currently we have a batch driven process at work which runs every 15 mins and everytime it runs it repeats this cycle several times:

Calls a sproc and get some data back from the DB
Process the data
Saves the result back to the DB

It can't load all the data in one go because the data are segregated by a number of fields and each group of data requires different behaviour during processing (configurable from a front end). However, recent changes in the business has resulted in a sudden surge in the volume of data (and therefore the processing time required) for some of the groups, so now whenever one of the groups overruns it delays all the other groups.

Our plan is to parallelise this process across multiple machines so that:

there is a central controller (master) and several workstations (slaves)
master is responsible for scheduling the runs (configurable from a front end)
master (or a separate component) is responsible for loading/saving data to and from the DB (in order to avoid deadlocks/contention between the multiple slaves)
slaves receive work items, process them and return the results to master
there is a primary slave (main production server in our environment) which will usually receive all the work items
secondary slaves will receive work only if the primary slave is working on a group which requires longer processing time (master can identify this based on the size of the data returned or it can be left to configuration)
if slave throws exception during processing, alert email is sent to support team, and the same work item is picked up during the next schedule cycle
not sure what to do with timeouts yet

I have done some research on the Master-Slave pattern for distributed environment but so far haven't found many reference material, does anyone here know of a good implementation of such pattern? Any pointers on potential pitfalls of such an architecture would be much appreciated too!

Thanks,

Solution

Your Master/Slave design above seems to imply that the writes to the database will be serialised anyway, so have you considered simply running multiple copies of your current process in parallel (e.g. by forking a new process for each job) and managing contention via a shared application lock?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow