Datalog Stratification

https://stackoverflow.com/questions/12379775

01-07-2021
|

Question

So I'm trying to understand how Datalog works and one of the differences between it and Prolog is that it has stratification limitations placed upon negation and recursion. To quote Wikipedia:

If a predicate P is positively derived from a predicate Q (i.e., P is the head of a rule, and Q occurs positively in the body of the same rule), then the stratification number of P must be greater than or equal to the stratification number of Q

If a predicate P is derived from a negated predicate Q (i.e., P is the head of a rule, and Q occurs negatively in the body of the same rule), then the stratification number of P must be greater than the stratification number of Q,

So, going by this, the two following predicates do not result in a stratification error as they can simply be assigned the same stratification number. So these predicates are fine, despite the circular definition.

A(x) :- B(x)
B(x) :- A(x)

But contrast that with what happens if we have a definition which has some negation involved (Where ~ is negation)

A(x) :- ~ B(x)
B(x) :- ~ A(x)

Here a stratification is impossible. A(x,y) must have a stratification number greater than B(x,y), and B(x,y) must have a stratification number greater than A(x,y). My first thought was that this was not okay because this is a circular definition, but stratification is fine with circularity so long as the predicates are not negated. But why? Truth values are simply binary. It seems extremely arbitrary to treat formulas which have a negation symbol differently in this manner. What is this stratification trying to prevent in the second case which isn't in the first?

Solution

I think the problem with:

A(x) :- \+ B(x)

B(x) :- \+ A(x)

...is that it has ambiguous semantics. This program has two minimal models, namely, {A(x)} and {B(x)}, and is therefore not well-defined under the fixed point semantics (no fixed point) or under the model theoretic semantics (no unique minimal model).

In order to address this problem, stratified semantics for Datalog imposes restrictions on the syntax of Datalog programs such that, if a stratification exists for the program, then it will also have a unique, minimal model in both the fixed point and model theoretic semantics (and vice-versa, I believe).

You can find more on the precise details of stratified semantics for Datalog in the text "Foundations of Databases by Serge Abiteboul, Richard Hull, and Victor Vianu" which happens to be freely available online, with the relevant detail in Chapter 15. This excellent text also explains most of the other terms I've used above like model, fixed-point, etc. if you're stuck.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow