Question

It is not hard to see that SAT for a CNF formula with $n$ variables and a constant number of clauses can be solved in polynomial time. On the other hand, it is not hard to see that a CNF formula with $n$ variables and $O(n)$ clauses is enough for NP-hardness (consider for example the instances of SAT associated with the natural formula for 3-colorability, applied to planar graphs).

We could define this formally as $\text{CNFSAT}-f-\text{Clauses}$, a family of problems parameterized by a function $f$, in which instances are formulas in CNF such that if they have $n$ variables, then they have at most $f(n)$ clauses. Based on this, what I'd like to know is what is the smallest function $g$ such that we know there exists $f \in O(g)$ such that $\text{CNFSAT}-f-\text{Clauses}$ is already NP-hard. We know that g = 1 (constant # of clauses) does not work, and $g = n$ (linear number of clauses) works.

What about $g = \log n$? Is there a simple algorithm for CNFSAT over formulas that have $O(\lg \lg n)$ clauses?

Was it helpful?

Solution

Lower bound. For $g \le c \cdot \sqrt{\log n}$ there exists a polynomial-time algorithm. The idea is the following: if some clauses have too many variables, then it should be trivial to select some variable to satisfy this clause, without hurting clauses with few variables. We repeat the following:

Find the clause with the smallest number of variables. Let $x_1,\ldots,x_k$ be the variables participating in this clause.

  • If $k > g$, then the entire formula is satisfiable (we process clauses one by one and select a variable which we didn't select before).
  • Otherwise, we remove the clause. We also remove $x_1,\ldots,x_k$ from all other clauses.

Now, we have to satisfy the removed clauses. Since there are at most $g$ clauses and each of them introduces at most $g$ new variables, it means that there are at most $g^2 = c^2 \cdot \log n$ variables overall. Therefore, there are at most $n^{c^2}$ variable combinations, and we can just use brute-force.

Conditional upper bound. It is almost tight in the following sense. Assume that the lower bound on the SAT with $n$ variables and $\ge c\cdot n$ clauses (for some $c$, e.g. coming from $3$-coloring) is $\alpha^n$ ($\alpha \in (1, 2]$). Note that the same lower bound holds after our transformation (since we can just apply it before any algorithm). Therefore, if there are at least $\log^{1+\epsilon} n$ clauses, they can have $\frac {\log^{1+\epsilon} n} c$ variables and the lower bound for running time for our problem is

$$\alpha^{\frac {\log^{1+\epsilon} n} c} = n^{\frac {\log^\epsilon n \cdot \log \alpha} {c}},$$

which is super-polynomial.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top