Need help with thinking about design to build bidding service [closed]

https://softwareengineering.stackexchange.com/questions/423065

23-03-2021
|

Question

I have a legacy service written in a procedural style which I am rewriting. I want to improve the maintainability first and foremost; the code gets tweaked often as the business rules change over time. Hence, I want to decouple a lot of things, encapsulate as much of the logic in their own objects as possible, and make it much easier to test.

I'm looking for help in how to think about the top-level design so as to make those kinds of improvements.

Current Service Description

The service basically matches up items and bidders. It does this dozens of times per minute. As a result, the availability of both items and bidders is rapidly changing during the process.

Constraints

There may be 1 to N of match-making processes running simultaneously (N currently less than 50, but we want to grow the business).
Each match-making "run" must complete in less than 1 second.
There could be tens of thousands of potential items to select from (not all will qualify at any given point, usually much closer to a thousand for example).
There may be up to hundreds of bidders.
Once an item has been "won" it is removed from further consideration until it is restocked at some future point.
Bidders have various limitations on how many items they can bid upon, so there is some significant logic required to properly qualify bidders on each item.

Procedural Outline

As a result of the above constraints, the current procedural code looks something like this outline:

Grab and reserve a 100 potential items out of the database.
Loop through the list of 100 items looking for cached items. (Items with matching bidders were cached after full evaluation as seen below.)

a. If cached item found, send to bidder, release entire working set back to database. DONE. (Future improvement, evaluate items still in reserved list to build future cache.)

b. If no cached item found, continue.
Loop through list of items, fully evaluating item and bidders:

a. Is item still available? If no, next item. (go to 3)

b. Else, get list of pre-qualified bidders.
Are there pre-qualified bidders?

a. No. Loop through "related items" linked to initial item, and look for pre-qualified bidders for each of them. Fall through when complete.

b. Yes. Loop through list of pre-qualified bidders.
Does pre-qualified bidder have current contract info?

a. No. Remove from bidder list.

b. Yes, next bidder.

At this point we have an item and a list (potentially zero in length) of bidders who can receive the item. There are few final checks to make to fully qualify the bidder.

Remember that we are still within the large loop which started at (2) above looping through the reserved, working set of 100 potential items.

If there are bidders remaining in the pre-qualified bidders list, send the item to those bidders for their final approval.

a. Continue through rest of item list, and cache any further successful matches with bidders.

b. If any remaining final checks fail, send that status in place of the actual item.
When all items in item list have been examined, release the database reservation.

Thoughts

What has made this difficult to think about and encapsulate for me is the multiple places where behavior can change significantly in the middle of looping through the working set. The original code had statements to break out of multiple levels of loops within loops. It's been modified a bit to use a pair of flags indicating state, but it's still pretty convoluted.

Because it has to be fast, and the time before data becomes out of date is short, figuring out how to encapsulate and decouple this has been hard for me to think about.

Edit to Add

In response to good questions below:

The loop process is kicked off by a human clicking in a web page. It's the "I'm ready for the next transaction I must manage" indicator. So the loop is really a way to manage finding the next transaction for each user.
The 100 record reservation is a tunable, arrived at by testing other values. Approximately 100 gives the best performance as defined by the least amount of wall clock time to return a matching item and bid to the person involved.

One can imagine the people as auctioneers, or as sales people, in that they take the item and finalize the sale arrangement. They must verbally speak with the item seller to confirm their details, and also speak with the buyer briefly to confirm the sale in the most significant situation.

The number 50 is a rounded-up number of the maximum number of these salespeople involved at any one time.

Regarding parallelism: This is a web application, so each of evaluation process is kicked off by a click on a web site by one of those sales people, of which there are usually 2 to 50 (currently).

Solution

I think OO is fine for this, as are other approaches. As written, it seems like only two things drive ALL your logic:

Someone taking an action (bid, retraction, price change, etc.)
Time (auction ending)

As such there is no need to constantly run a procedural loop (the equivalent of polling) for anything other than time-based stuff like auction expirations, emails for auctions about to end, etc. Everything else can be event-driven based on user actions. If you separate your stateful classes from those that "do stuff" you'll be able to scale horizontally (much) more easily.

Not sure why your forebears over-complicated this. Or maybe there are elements you've left out and/or I'm not appreciating. If so, I'll try to revise accordingly based on your updates.

This seems like a fun one where you can improve code quality, performance, and scalability. Good luck and have fun!

Idea to Explore: One process adds items to be "managed" to a queue. User-initiated process pulls items from the queue to be "managed." This should let you scale out the time-bound logic more easily.

OTHER TIPS

I don't think this is a good application for applying OO. Whether it is or not, I don't think you'll get a lot of assistance with top-level OO design patterns.

The existing system is super complicated, with parallelism and complex logic, and has evolved into an approach that probably no one fully understands: 100 records are grabbed & locked — why 100? 50 parallel matching jobs doing this grabbing — why 50? Cached records preferred over non-cached (not exactly a business rule).

I assert that you will not be able to replicate the existing system behavior starting with a new top-level design, which would probably be the fullest / most complete way to apply OO.

Does anyone know whether the system would do a better job of matching with different parameters? What kind of tests do you have for quality of matching? If none, I'd work on that first and foremost. Are you leaving better matches on the table due to the performance constraints? How do we characterize the trade-off in match quality vs. match speed?

Your system has significant parallelism, which OO does not really address. If you're looking for some kind of replacement, I'd be looking at the overall architecture, especially the parallelism (50 concurrent jobs and the grabs of 100). I'd ask questions about distributed systems vs. single large memory server to handle the whole load. If a distributed system is necessary what kind of sharding can be done to minimize commits and locking between servers?

You are right to focus on top-level design: though rather than having a goal to convert the system to OO, I would seek to both improve the functional quality of results as well non-functional performance aspects, via broad architectural approaches dealing with the above, and testing to verify improvement. If OO comes along with that, then great — though fundamentally you want an understandable & performant architecture & design (then maybe OO or something else to help implement that).

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange