Question

I have expressions like this:

(cat OR cats OR kitten OR kitty) AND (dog OR dogs) NOT (pigeon OR firefly)

Anyone having idea how to make tables to save those?

Before I got request for usage of brackets, I limited usage of operators to avoid ambiguous situations. So only ANDs and NOTs or only ORs and saved those in this manner:

operators

id | name  
1  | AND  
2  | OR  
3  | NOT  

keywords

id | keyword  
1  | cat  
2  | dog  
3  | firefly  

expressions

id | operator | keywordId  
1  | 0        | 1   
1  | 1        | 2  
1  | 3        | 3  

which was: cat AND dog NOT firefly

But now, I'm really puzzled...

Was it helpful?

Solution

I think you need a possibility to concat "subexpressions". Why not have a nullable foreignkey in Table Expressions linking to itself(to the parent expression)?

OTHER TIPS

Saving them as strings or serialized data structures (a parse tree for example) is most likely the best solution unless you really need to modify parts of the expression in the database itself.

I'd store them as reverse Polish in text format with operators/operands by blanks, for your examples:

cat cats OR dog dogs OR AND
pigeon firefly OR NOT

This allows you to implement an boolean expression evaluator really easily and simply, and I presume is what you want to with them.

If you wanted to make it even easier to evaluate, I'd store bindings of object names to a small vocabulary (e.g., A-Z) and a similar vocabulary for AND, OR, NOT:

cat A cats B dog C dogs : DAB+CD+&
pigeon A firefly : AB+~

Then the basic expression evaluator only has to work on invidual characters and is really, really easy to code.

In situations like this in the past I have created an integer column that bitwise operations can be performed against. Explanation below:

Begin by assigning a single binary digit to each value-

Cat  Dog  Firefly
---  ---  ------
1     2     4

Next you will add an integer column to your main table, we will call it options. When the number is converted to binary each digit will represent weather cats, dogs or fireflys are allowed. Example:

5 = 101 in binary = Cats allowed, dogs not allowed, fireflys allowed.

id | locationName | options
---------------------------
1  | loc 1        | 5
2  | loc 2        | 2
3  | loc 3        | 7
4  | loc 4        | 6

We can now use bitwise operations against the options column to determine what options are allowed. Examples:

To get all records that allow dogs when we don't about cats or fireflys you would perform the following bitwise operation:

2 & options = 2

This would return records 2,3 and 4.


To get all records that allow dogs and fireflys we would perform he following bitwise operation:

6 & options = 6

This would return records 3 and 4


To get all records that allow cats and fireflys we would perform the following bitwise operation:

5 & options = 5

This would return records 1 and 3.


ONLY accepts fireflys:

4 | Options = 4


Doesn't accept fireflys:

4 & options = 0


This is probably a hard concept to grasp so please let me know if you have any questions. It seems to me that it might be the simplest way to accomplish what you are trying to do once you can grasp the concept.

(cat OR cats) AND (dog OR dogs) NOT (pigeon OR firefly)

and

cat AND dog NOT firefly

Are both invalid boolean expressions as NOT is a unary operator (it only takes 1 operand).

Having said that, this is a toughie. It's hard enough to do localization on a database level. Hmmm.. The below comes to mind:

Expressions

Id|Operator|Keyword1|Keyword2|Expression1|Expression2

So here, keyword1, keyword2, expression1 and expression2 are all nullable, and every expression is stored as either a unary operation on a keyword or another expression, or a binary operation on 0, 1 or 2 keywords and 0, 1, 2 other expressions. 1 record per expression, with aditional records for all subexpressions. You can have the evaluation in code done recursively.

This way you won't have duplicate Ids. The only downside I see is that it would be hard to preserve things like (CAT AND CATS) AND DOG vs CAT AND (CATS AND DOG). Both of these evaluate to the same, but the order of computation changes.

Pretty sure this would work. Hit me up in the comments if you need more details.

Plamen

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top