Plans for preprocessing of SAT problems in general.
More...
Go to the source code of this file.
Detailed Description
Plans for preprocessing of SAT problems in general.
 Todo:
 Two general forms of preprocessing

"Preprocessing by reduction" is done by applying some "reduction", which simplifies the input.

The general aim here is to eliminate "trivial cases".

Preprocessing by reduction is heavily employed in worstcase upper bounds, where reductions (used at each node) guarantee that branching will have a strong effect in all branches in the instance.

"Preprocessing by amplification" on the other hand aims at making detection of inconsistencies (for CNFs) after application of partial assignments easier, by making the instance (after application) easy to refutate for (certain) reductions.

While preprocessing by reduction aims at simplifying the instance according to some measures, preprocessing by amplification typically needs to make the instance bigger (typically the introduction of new variables is essential).

Preprocessing by amplification is a more powerful process, needing to uncover deeper relations, while preprocessing by reduction (which is what is typically understood as "preprocessing" in these days) has more the character of a (necessary) hygienic measure.

The tools for preprocessing by reduction are delivered by module Satisfiability/Lisp/Reductions (see Satisfiability/Lisp/Reductions/plans/general.hpp).

A general scheme for preprocessing by amplification is "preprocessing
by local compilation" as discussed below, with the special case of "preprocessing by local dualisation" (see below).
 Todo:
 Preprocessing by reduction

The basic forms are:

application of forced assignments (see Satisfiability/Lisp/Reductions/plans/GeneralisedUCP.hpp)

application of autarkies (see Satisfiability/Lisp/Reductions/plans/GeneralisedUCP.hpp and for example ComputerAlgebra/Satisfiability/Lisp/Autarkies/plans/MatchingAutarkies.hpp)

variable elimination by DPreduction if the instance becomes "smaller" (in some sense) (see Satisfiability/Lisp/Reductions/plans/DPReductions.hpp)

replacing a subset of clauses by their set of prime clauses, if the instance becomes smaller in some sense (so it is (also) a reduction, and not (primarily) an amplification).

Replacing subsets by their sets of prime clauses should cover (and explain in a sense) all forms of (local) applications of resolution; for example 2subsumption resolution considers 2element subsets.

We need also to consider the case, that the set of all prime clauses is too big, but that some prime clauses are just very short, or even subsume some old clauses.

Or perhaps is subsumption crucial here? So we might not use all prime clauses, but at least all old clauses are subsumed.

So in effect single clauses are replaced by strict subclauses.

Attractive subsets are perhaps distinguished by a high clause density.

Addition of blocked clauses should be considered as amplification.
 Todo:
 Preprocessing by local compilation

Given a CNF F, a natural preprocessingbyamplification scheme is to partition (or, more generally, to cover) F by parts F_i, where each F_i is then analysed and compiled into some "active clause" C_i, a condition which is equivalent to F_i.

A "good condition" C_i determines for every partial assignment whether instantiation yields an alwaystrue or an alwaysfalse problem; in the former case we can drop the condition (an "autarky" was found), while in the latter case the whole (current) problem is unsatisfiable.

The active clauses C_i may be translated back into CNFs F_i', possibly using new variables (this seems to be crucial). Then the detection of C_i being a unsatisfiable or a tautology under a partial assignment (only to the original variables in F_i) should be achieved by means of certain reductions.

Considering only the unsatisfiability case, this notion of a good representation F_i' w.r.t. reduction r has been called a "rbased
representation" in "Investigating conditions and their representations" in Experimentation/Investigations/Cryptography/AdvancedEncryptionStandard/plans/general.hpp.

The tautology case doesn't seem to fit into the standard SAT methods (forced assignments and autarkies), but seems to need a specific test, which however can be easily achieved by just checking whether the partial assignment satisfies the original CNF F_i (using that every nonempty CNF is falsifiable).

The active clause C_i could just be some representation, for example by circuits or branching programs, which are then translated into CNFs F_i'. See "Preprocess a clauseset by local dualisation" below for the perhaps most natural starting point, namely using a DNF for C_i.

Since a nonempty DNF C_i is always satisfiable, and partial assignment phi yielding an unsatisfiable problem is easily detected by phi * F_i = {}. However, using for example circuits and partial evaluation, this is not necessarily the case.

How is it with OBDDs?

At the OBDDlevel, applying a partial assignment should be possible in polynomial time, and then we can just check whether we have a constant function.

But is the preserved by the translation into a circuit?

Likely, if the circuit can be easily checked for unsatisfiability, then the Tseitintranslation should preserve that.

However, intuitively it seems unlikely to me (OK), that the translation from the OBDD to a circuit preserves unsatisfiability detection, since this likely needs some form of not completely trivial computation on the OBDDlevel (namely the instantiation).

List of possible applications:

See "Examples" below.

Ramseytype problems should be interesting candidates due to their high level of "internal repetition".

See "Preprocessing" in Investigations/RamseyTheory/VanderWaerdenProblems/plans/general.hpp.
 Todo:
 Preprocess a clauseset by local dualisation

Partition (cover) the CNF F into F_i, compute good (small) DNF representations G_i for the parts F_i, translate the G_i into CNFs F_i', obtaining the preprocessed F' (with parts F_i').

Compare "Investigating conditions and their representations" in Experimentation/Investigations/Cryptography/AdvancedEncryptionStandard/plans/general.hpp; there the (unique) DNFs are already given.

A "good" DNF representation means here

small number of clauses

using only variables on which the problem depends.
The number of clauses equals the number of new variables.

Computing the DNF representation by DLL:

One way of obtaining a DNF representation is by computing a splitting tree for F_i, where r_k can (should) be used (perhaps k=3), and where the tree should be condensed.

See ComputerAlgebra/Satisfiability/Lisp/Backtracking/plans/SplittingTrees.hpp. For really hard problems one could seek for an optimal r_ksplitting tree.

One obtains a hitting DNF from that, and then, as discussed in Cryptology/Lisp/Cryptanalysis/Rijndael/plans/SboxAnalysis.hpp and in Satisfiability/Lisp/Primality/plans/PrimeImplicatesImplicants.hpp, a shortened DNF (consisting only of prime implicants) is computed. (Note that in these two case studies the roles of CNF/DNF are reversed; but it's always the problem of "dualisation" of a clauseset.)

An alternative for small numbers of variables is discussed in "Minimisation" in OKlib/Satisfiability/FiniteFunctions/plans/general.hpp.

A natural candidate for the translation G_i > F_i' is dualts_fcl:

For partial assignments phi via r_1(phi * F_i') an inconsistent problem F_i is detected.

See "Understanding dualts_fcl" in Lisp/FiniteFunctions/plans/TseitinTranslation.hpp.

The alwaystrue case (i.e., whether phi * F is a tautology) would be detected by checking whether phi * F_i = {} holds. This could be called "tautology monitoring". The point is that F_i' can not yield any restriction on the primary variables, while the auxiliary variables of F_i' are local, and thus restrictions on them don't matter. Thus once phi * F_i = {} holds, one can safely remove F_i' (i.e., its current residuum).

For permutations like the Sbox, dualts_fcl should be good (and no tautology monitoring is needed, since instantiation can not yield tautologies), but in general "sharing" should be important.

One also needs to consider the choice F_i' = set of all prime implicates of F_i:

Without using new variables, this is the unique minimal choice when demanding that for all partial assignments phi with unsatisfiable phi * F_i the clauseset phi * F_i' contains the empty clause.

This F_i' has two basic computations, by computing the resolution closure (plus subsumption eliminination; see min_resolution_closure_cs), or by computing some DNF representation and then the combinatorial dual (plus subsumption elimination; see dual_cs).

In the (special) case that F_i' is small, this should be definitely the best choice.

Finding a good cover/partition (F_i)

Easiest to start with considering a constant N and letting the F_i only contain N variables (and all clauses using only these k variables).

Given a set V of variables, let F(V) be the set of clauses C in F with var(C) <= V.

The partitioning then is F_1, ..., F_{n/N}, R, where F_i = F(V_i), and where R contains the clauses of F not contained in any of the F_i.

R is left as it is if it contains too many variables.

Some optimisation problem has to be solved  which?! I.e., when is F(V) "good"?

One could just optimise c(F(V)) (the number of clauses).

However what about short versus long clauses? Short clauses should be more valuable.

Simplest to give binary clauses weight 1, and using a factor alpha < 1 and weights W(C) := alpha^(k2) for clauses of length k.

A "classical" factor is alpha = 1/5 (used for the OKsolver_2002), but perhaps here alpha = 1/2 is more appropriate (in the boolean case)?

The task is now to partition the variable set V into V_1, ..., V_{n/N}, such that the sum of the w(F(V_i)) is maximised, where w(F) = sum_{C in F} w(C).

Note that summing up the w(F(V_i)) makes sense despite additivity of w, since we are considering a covering, not a partitioning.
 Todo:
 Finding and exploiting functional dependencies

Given a boolean function B in variables x_1, ..., x_n, variable x_i is dependent on S <= {x_1,...,x_n}  {x_i} if for every partial assignment phi with var(phi) = S the value of x_i in phi*B is forced (has at most one possibility to make phi*B true).

If we find x_i dependent on {}, then we can just add the corresponding unitclause to the representation of F.

If x_i depends on {x_j}, and this is minimal, then either x_i = x_j or x_i = x_j, and this substitution should be performed in F, while by adding the corresponding equivalence we maintain equivalence.

These simplifications are likely "always" beneficial.

What about a dependency on two variables?

One could always add a representation of the dependency, making it explicit.

This is beneficial if some implications are now "easily" available; one could check this.

But in general it seems not appropriate to perform some kind of substitution.

But if we are producing a DNFrepresentation of B, then we could project that to {x_1,...,x_n}  {x_i}, and add then the representation of x_i = f(S).

This could also be done for BDDrepresentations.

How to find functional dependencies efficiently?

Which representation of B is best here?

We can always systematically run through all the corresponding partial assignments, and solve a SAT problem.

A possibility here is to use the r_kreductions, which may yield several forced assignments at once.

If B can be represented by a HornCNF (likely we should check this?!), then at least determining a specific dependency is easy; see [Ibaraki, Kogan, Makino; 1999 + 2001; Artificial Intelligence] for further considerations.
Definition in file general.hpp.