How to design a Rule Engine Validation System to be asynchronous and highly performant!

https://softwareengineering.stackexchange.com/questions/342615

07-01-2021
|

题

I am implementing a validation system which has some constraints, there are category of users let's say 30 and for each category there are number of rules. Actually there are 100 different rules.

I have a multimap which maps given user_category_id to list of rule_id's.

Constraint 1. The rule should only execute if its enabled for that user category. Rules are differently configured based on whether its for update or insert.

Here I am thinking of using Strategy pattern to validate the rules.

From the context I will pass as parameter the (user_category_id, rule_id's list, and whether its for Update or insert). Each rule, I feel, should be a separate class with methods to get its rule id value, execute method for the given user category and logic based on whether its for update or insert.

So the context class will have the multimap for the user_category_Id to the rule_id. and the value for whether the rule validation is being done for update or insert.

In the IStrategy interface will be having the signature of the method. But what would be the return type of the method I am not sure of. There is a Response class which is encapsulating my response and is sent to the user.

Last part is how to design the rule logic. There are hundreds of different rules. How to go about this task? Should I encapsulate all the 100+ rules into separate classes? I feel I should be using Strategy pattern but I am not clear about how to carry out the implementation? I need some guidance. Let me know if some more info is required for the above design work. Any suggestion and guidance for improving the above design will be highly appreciated.

解决方案

This answer will be incomplete as it does not address your question on "how to design" - whether to incapsulate classes or not.

But in the interest of efficiency, I think you can separate your rules in groups based on their dependencies. That is:

Let A be a rule, and B be a rule, both applying to the same categories.

We have these cases:

A has the same truth value of B (both either apply or don't apply)
A has the opposite value of B
A and B are not reliably dependent.

If case 1 holds, then you can place A and B together in a single "rule group" and only test one rule for each group. The truth of that one rule is the truth of the group.

If case 2 holds, then you can place A and B in different groups connected by a "meta-rule" that updates the truth of one group once the other is known. A group of groups connected this way are a family.

If case 3 holds, the two groups are not connected.

A single group is a family of its own.

Now instead of 100 rules you might have 90 groups and 80 families, and you only need 80 checks. If the rule checks have a cost, for each family you perform the cheapest rule.

Families being independent by construction, if you want you can run the checks in parallel.

(Actually this is a simplification - it does not exploit any knowledge about joint probability of the rules, and only deals with certainties. But to be able exploit less-than-certain knowledge, things would need to become much more complicated)

For each family you will also keep a statistic - how many times the check passed, and how many times the test failed and the rule checking was aborted by that family.

Then, you start checking in order of descending family efficiency - starting from the most efficient family, the one with the lowest product of (passes/total) multiplied its cost. Gradually, the rules will sort themselves to achieve the maximum efficiency. Rules that always pass will float to the bottom and be checked last (and you may want to check them and possibly disable them altogether).

(Again, we apply this framework blindly to all users without exploiting any knowledge about them (profiling). In general, rules will match different "types" of users with different probabilities. Again, to exploit this additional knowledge things would need to be more complicated; a part of this complexity can be translated in more rules without complicating the framework)

许可以下： CC-BY-SA 和归因

不隶属于 softwareengineering.stackexchange