Measuring the difficulty of SAT instances

https://cs.stackexchange.com/questions/407

16-10-2019
|

Question

Given an instance of SAT, I would like to be able to estimate how difficult it will be to solve the instance.

One way is to run existing solvers, but that kind of defeats the purpose of estimating difficulty. A second way might be looking a the ratio of clauses to variables, as is done for phase transitions in random-SAT, but I am sure better methods exist.

Given an instance of SAT, are there some fast heuristics to measure the difficulty? The only condition is that these heuristics be faster than actually running existing SAT solvers on the instance.

Related question

Which SAT problems are easy? on cstheory.SE. This questions asks about tractable sets of instances. This is a similar question, but not exactly the same. I am really interested in a heuristic that given a single instance, makes some sort of semi-intelligent guess of if the instance will be a hard one to solve.

Solution

In general, this is a very relevant and interesting research question. "One way is to run existing solvers..." and what would this even tell us exactly? We could see empirically that an instance seems hard for a specific solver or a specific algorithm/heuristic, but what does it really tell about the hardness of the instance?

One way that has been pursued is the identification of various structural properties of instances that lead to efficient algorithms. These properties are indeed prefered to be "easily" identifiable. An example is the topology of the underlying constraint graph, measured using various graph width parameters. For example it is known that an instance is solvable in polynomial time if the treewidth of the underlying constraint graph is bounded by a constant.

Another approach has focused on the role of hidden structure of instances. One example is the backdoor set, meaning the set of variables such that when they are instantiated, the remaining problem simplifies to a tractable class. For example, Williams et al., 2003 [1] show that even when taking into account the cost of searching for backdoor variables, one can still obtain an overall computational advantage by focusing on a backdoor set, provided the set is sufficiently small. Furthermore, Dilkina et al., 2007 [2] note that a solver called Satz-Rand is remarkably good at finding small strong backdoors on a range of experimental domains.

More recently, Ansotegui et al., 2008 [3] propose the use of the tree-like space complexity as a measure for DPLL-based solvers. They prove that also constant-bounded space implies the existence of a polynomial time decision algorithm with space being the degree of the polynomial (Theorem 6 in the paper). Moreover, they show the space is smaller than the size of cycle-cutsets. In fact, under certain assumptions, the space is also smaller than the size of backdoors.

They also formalize what I think you are after, that is:

Find a measure $\psi$, and an algorithm that given a formula $\Gamma$ decides satisfiability in time $O(n^{\psi ( \Gamma )})$. The smaller the measure is, the better it characterizes the hardness of a formula.

[1] Williams, Ryan, Carla P. Gomes, and Bart Selman. "Backdoors to typical case complexity." International Joint Conference on Artificial Intelligence. Vol. 18, 2003.

[2] Dilkina, Bistra, Carla Gomes, and Ashish Sabharwal. "Tradeoffs in the Complexity of Backdoor Detection." Principles and Practice of Constraint Programming (CP 2007), pp. 256-270, 2007.

[3] Ansótegui, Carlos, Maria Luisa Bonet, Jordi Levy, and Felip Manya. "Measuring the Hardness of SAT Instances." In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI’08), pp. 222-228, 2008.

OTHER TIPS

Since you know about the phase transition, allow me to mention a few other simple checks that I am aware of (which are likely subsumed by the constraint graph analysis):

Some early random SAT generators inadvertently created mostly easy formulas because they used "constant density", meaning a roughly equal proportion of all clause lengths. These were mostly easy because 2-clauses and units simplify the problem significantly, as one should expect, and really long clauses either don't add much branching or facilitate hyper-resolution even better. So, it seems better to stick with fixed-length clauses and vary other parameters.
Similarly, a powerful simplification rule is Pure Literal Elimination. Obviously, a formula is really only as hard as the number of impure literals it has. Because Resolution creates new clauses numbering $|x|*|\lnot x|$ (meaning, the product of occurrences of $x$ and its negation), the number of resolvents is maximized when there are an equal number of positive and negative literals for each variable. Hence, Balanced SAT generators.
There is also a technique called "No Triangle SAT", which appears to be pretty fresh[1], which is a kind of hard-instance generator which forbids "triangles". A triangle is a set of clauses containing 3 variables $v_1, v_2, v_3$ which occur pairwise in all combinations in some set of clauses (i.e., $\{v_1, v_2, ...\}, \{v_2, v_3, ...\}, \{v_1, v_3, ...\}$). Apparently, triangles tend to make a formula easier for known solvers.

[1] https://arxiv.org/pdf/1903.03592.pdf

On top of Juho's excellent answer, here's another approach:

Ercsey-Ravasz & Toroczkai, Optimization hardness as transient chaos in an analog approach to constraint satisfaction, Nature Physics volume 7, pages 966–970 (2011).

This approach is to rewrite the SAT problem into a dynamical system, where any attractor of the system is a solution to the SAT problem. The basins of attraction of the system are more fractal as the problem becomes harder, and so the "difficulty" of the SAT instance can be measured by examining how chaotic the transients are before the system converges.

In practice, this means starting a bunch of solvers from different initial positions and examining the rate at which solvers escape the chaotic transients before they arrive at an attractor.

It's not hard to come up with a dynamical system for which the "solutions" are solutions to a given SAT problem, but it's a little harder to make sure that the solutions are all attractors and not repellers. Their solution is to introduce energy variables (akin to Lagrange multipliers) to represent how badly a constraint is being violated, and try to get the system to minimise the energy of the system.

Interestingly, using their dynamical system, you can solve SAT problems in polynomial time on an analog computer, which in itself is a remarkable result. There is a catch; it may require exponentially large voltages to represent the energy variables, so unfortunately you can't realise this on physical hardware.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange