Algorithm to Calculate Density of Boolean Function

Question 1

The bad news: there's no hope for an algorithm that is always fast.

Namely, this problem:

Given a boolean formula in a conjunctive normal form (product of sums), decide if there is an assignment of the free variables such that the formula yields true

is NP-complete. This means that if you find a polynomial-time algorithm that solves it, you can solve some of the world's hardest problems (knapsack, travelling salesman problem, hamiltonian cycle problem just to name a few) in polynomial time as well. No-one really expects that to be possible.

This problem is, in fact, equivalent to this problem:

Given a boolean formula in a disjunctive normal form (sum of products), decide if its density is 100%

The good news:

The input size is likely to be very small. At three variables, you don't really care for speed. At thirty input variables, you still are more likely to run out of memory (using some algorithms) than to run unbearably long.

Algorithm #1: O(2^v*i) time, few lines of code, v= number of variables; i=length of the input.

if any clause is inconsistent (A & !A), drop it.
sort the clauses by size, largest (fewest variables) first.
for every minterm in the universal set
- for every clause in the input
  - for every literal in the term
    - if the minterm is not covered by the literal, continue to the next clause
  - The minterm is covered by the clause:
  - Count the minterm as covered, and continue to the next minterm.
- The minterm is not covered by any literal; continue to the next minterm.
let density = [#covered terms]/[#terms]

If you want to run faster, you will need to hope for a good input. You can try to use binary decision diagrams (BDDs) to encode the current set of encoded minterms, and try to update the binary decision diagram as you add clauses from the input.

A binary decision diagram is a rooted directed acyclic graph such that every node is either a decision node (test a single variable, then take either the false branch or the true branch) or a leaf node (either true or false). For example, XOR can be expressed with this binary decision diagram:

     |
     A
    / \
   /   \
  B     B
 / \   / \
0   1 1   0

Algorithm #2 (Lazy-expanding BDD, more complicated but potentially faster for large amount of variables):

if any clause is inconsistent (A & !A), drop it.
sort the clauses by size, largest (fewest variables) first.
start with an empty BDD (root = false)
for every clause
- update the BDD: start on the root.
- for every variable:
  - if the clause has no more literals, replace the current node with true (only at the edge along which you came)
    - if the immediate sibling is also true, replace the common parent with true, and repeat this test for the new node.
  - if the current node is true
    - continue to the next clause or recursion branch, or join the sibling thread.
  - if the variable is in the clause and in the current BDD node,
    - descend to the child that intersects the clause.
  - if the variable is in the clause but not in the current BDD node.
    - create a new BDD node
    - set both children to the current node
    - replace the current node with the new node (only at the edge along which you came)
    - descend to the child that intersects the clause.
  - if the variable is in the BDD but not in the clause
    - descend to both children in turn recursively. Alternatively, descend to both children in parallel in separate threads.
  - if the variable is neither in the BDD nor in the clause
    - do nothing.
let density = 0/(2^variables) (the denominator signifies the suggested scale for integer computation)
every leaf node is either true or false. For every path in the BDD
- if the leaf node is true
  - let length be the number of non-leaf nodes encountered along the path.
  - add 1/(2^length) to density.

Question 2

let f and g be boolean expressions.

Recursive breakdown

d(f and g) = d(f)d(g|f)
d(f or g) = d(f) + d(g) - d(f and g)
for d(g|f) you do unit/expression propergation of f in g

E.g. d(x' + y | x) = d(y)     (I do not have a way to implement it for non single literals on right side of | apart from brute force)

d(g|f) can also be read as what is the density of g in the true area of f.

Example 1:

d(ab'+c)
= d(ab') + d(c) - d(ab'c)
= d(a)d(b) + d(c) - d(a)d(b')d(c)
= 0.5*0.5 + 0.5 - 0.5*0.5*0.5
= 0.625

Example 2:

d((a+b)(a+c))
= d(a+b)d(a+c|a+b)
= (d(a) + d(b) - d(a)*d(b))*(d(a|a+b) + d(c|a+b) - d(ac|a+b))
= 0.75*((2/3) + 0.5 - (1/3))
= 0.625

Probably not better than brute force, but it is recursive.

Also it is easier to do d(g|f) if the f is a cube or a literal. So the following identity can be used to swap sides:

d(g|f)d(f) = d(f|g)d(g)

Another example for resolution making use of the identity above (for swapping args to given operator)

d((a+b)(a'+b))
= d(a+b)d(a'+b|a+b)
= 0.75(d(a'|a+b) + d(b|a+b) - d(a'b|a+b))
= 0.75(d(a+b|a')d(a')/d(a+b) + d(a+b|b)d(b)/d(a+b) - d(a+b|a'b)d(a'b) / d(a+b))
= 0.75(d(b)d(a') + d(1)d(b) - d(1)*d(a')*d(b))/d(a+b)
= 0.75*(0.5*0.5 + 1*0.5 - 1*0.5*0.5)/0.75
= 0.5

An interesting pattern occurs for a 2/3 circuit:

d((a+b)(b+c)(c+a))
= d(a+b)d((b+c)(c+a)|a+b)
. . . skipping many steps . . .
= d((a+b)(b+c)|c)d(c) + d((a+b)(b+c)|a)d(a) - d((a+b)(b+c)|ca)d(ca)
= d(a+b)d(c) + d(b+c)d(a) - d(1)d(c)d(a)
= 0.75*0.5 + 0.75*0.5 - 1*0.5*0.5
= 0.5

One more trick, to reduce the branching the following identity can be used:

d(a+b) = 1 - d(a'b')

That is basically demorgans theory used for a different purpose. Now instead of branching 3 ways to process an OR, only one way is now branched.

Your example again, showing all steps and using demorgans identity:

d(ab'+c')
= 1 - d((ab')'c')
= 1 - d((ab')')d(c'|(ab')')
= 1 - (1 - d(ab'))d((ab')'|c')d(c) / d((ab')')
= 1 - (1 - d(ab'))(1 - d(ab'|c'))d(c') / (1 - d(ab'))
= 1 - (1 - 0.25)(1 - d(ab'))*0.5 / (1 - 0.25)
= 1 - (1 - 0.25)(1 - 0.25)*0.5 / (1 - 0.25)
= 0.625

Adding a cache for common subexpressions should provide a speed boost too.

After a bit of playing around. I've come up with a nice recursive expression for the density of a Boolean expression in conjunctive normal form.

let f be the first CNF clause
let g be the remaining clauses
d(fg) = d(g) - d(g|f')d(f')

Each recursion strips off one clause in the left hand side branch and strips off three variables from the right hand side branch (for 3CNF). (d(f') is constant time, because it is single clause)

Here is what that last expression looks like in Rust.

pub fn cnf_density(mut cnf: CnfFormula) -> f64 {
    if cnf.is_empty() {
        return 1.0;
    }
    // d(fg) = d(g) - d(g|f')d(f')
    let clause1 = cnf.remove(0);
    let mut cnf2 = cnf.clone();
    for lit in &clause1 {
        // unit propergation of f' in g|f'
        cnf2.assign_lit(lit.negated());
    }
    let d1 = cnf_density(cnf);
    let d2 = cnf_density(cnf2);
    d1 - d2 * 2.0_f64.powf(-(clause1.len() as f64))
}

Found another trick. You can start with f=1 and d(f) = 1 then calculate the new densities from the old while adding clauses to build you CNF expression:

d(f(a+b)) = d(f) - d(f|!a!b)d(!a!b)

You'll need to do a limited cache of the density function for its input for any technic if you want any hope of performance. Its still exponential. The great thing about the d(f(a+b)) = d(f) - d(f|!a!b)d(!a!b) technique is that it gives you a decreasing upper bound. So you can terminate it at any point and still tells you what the density will be less than. Would be great if there was a similar trick for a increasing lower bound.

Last trick in Rust:

pub fn cnf_density(&mut self, cnf: CnfFormula, lvl: usize) -> f64 {
    if cnf.is_empty() {
        return 1.0;                                                                         }
    if cnf.len() == 1 && cnf[0].is_empty() {
        return 0.0;
    }
    if let Some(d) = self.cache.get(&cnf) {
        return *d;
    }                                                                                       // d(f(a+b)) = d(f) - d(f|!a!b)d(!a!b)
    let mut f = 1.0_f64;
    let mut cnf2 = CnfFormula::new();                                                       let mut idx: usize = 0;
    if self.cache.len() > 1000 {
        self.cache.clear();
    }
    for clause in &cnf {
        idx += 1;
        if lvl == 0 {
            println!("{:?}", clause);
            println!("{}/{}", idx, cnf.len());
            println!("{}", self.cache.len());
            println!("d(f) = {}", f);
        }
        let mut cnf3 = cnf2.clone();
        cnf2.push(clause.clone());
        for lit in clause {
            cnf3.assign_lit(lit.negated());
        }
        f -= self.cnf_density(cnf3,lvl+1) * 2.0_f64.powf(-(clause.len() as f64));
    }
    self.cache.insert(cnf, f);
    f
}