Question

Is Drools suitable for writing rules for Stemming and/or POS tagging ? Suggestions for a better rule-language are welcome. I read many papers in this field that use the rule-based approach but none of them mentioned what library or framework was used to write the rules.

My rules are like the following:

if (length = 3 & first_letter in group1 and second_letter in group2) then ...
if (length = 3 & first_letter in group1 and second_letter not_in group2) then ...
if (length = 3 & first_letter not_in group1 and second_letter in group2) then ...
if (length = 3 & first_letter not_in group1 and second_letter not_in group2) then ...
if (length = 4...

... and so on.

The problem is that these rules are too many to handle. Imagine that there are ten letter-groups, and that there is a case for each letter belonging to each group. I could easily have over a thousand rules to classify a word correctly. I wrote 30 of those rules in plain C# code and that was enough for me to see how inefficient this approach was. I already have my rules organized as a tree on paper. I just need the right framework to insert, represent, tweak, and test them.

I hope my question is clear. Thank you.

Was it helpful?

Solution

You can certainly use Drools for that. Drools can handle many thousand rules (I've seen kbases with 30k+ rules), much more complex than the ones you present above, without a sweat.

The main issue I see is not the runtime, but the maintenance of your rules. Doing it manually, due to your use case, seems a lot of work, does not matter which language/engine you choose. Maybe you can use a decision table to define your rules as that is usually a lot less "typing" to do? or maybe you can have a script generate all the rules for you? Drools supports both.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top