realtime message matching against tons of rules

https://stackoverflow.com/questions/15736134

31-03-2022
|

Question

I have a scenario that messages come in to my system at a rate 300,000/second at peek, and the system will determine where each message goes (client) by matching each message against specific rules.

Now the challenge is there are 10,000 clients (let's say each client just has one rule defined by user), so for each incoming message, it will be matched against each rule to determine where it should go ( a message can go to multiple different clients if it matches the rules ).

Now, let's be more specific.

A message consists of following fields for instance.

Message (type, region, level, ... )

A rule can be very complex, but now let's make it a simple one

rule1: (type in (100, 200, 300) && region not in (A, B ,C)) || level in (100)

keep in mind that in real world, a message has around 50 fields or so, and rules can be much more complex than this.

Here the question is:

is there any possible way to reduce the times needed to match a message ? or
is it possible to 'merge all the rules into one' ?

Thing on my mind is FSM, but I am not familiar with it at all, so any hints from you will be appreciated.

EDITED:

I gave Drools a try, and it turned out to be somewhat feasible, but its performance is far from good (well, it can be good for most of other use cases).

In my case, it can only process around 5k messages per second, but I have 300k messages at peek. So now I am thinking maybe a Rule Engine can hardly meet my requirements.

Any ideas ?

Thanks a lot in advance.

Ben

Solution

Let me answer myself.

Actually, what I need is some kind of expression evaluator rather than a big rule engine, so when I came to know this, I found GNU JEL (Java Expression Library) is out of there for years.

JEL compiles each specified expression and then clients can give it a context of runtime instance when evaluate it, which can be very very fast. For instance, for a give single expression, JEL can evaluate more than 1,000,000 times within a second (it depends on the complexity of the given expression and the data to be matched against)

So JEL is the most reasonable solution for me by far.

Hope this post can inspire you a bit.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow