On investigations regarding Sudoku problems (standard, and extended)
More...
Go to the source code of this file.
Detailed Description
On investigations regarding Sudoku problems (standard, and extended)
Besides experimentations in creating and solving big Sudoku problems (with or without the uniqueness constraint) one should harness the popularity of Sudoku to teach (generalised) satisfiability.
 Todo:
 Connections
 Todo:
 Translations to boolean CNFs based on the direct encoding

In ComputerAlgebra/Satisfiability/Lisp/Generators/Sudoku.mac we provide three translations:

output_weak_sdk_stdname

output_dual_weak_sdk_stdname

output_strong_sdk_stdname

In general we have four types of clauses for the direct encoding:

primary binary clauses: different numbers per line

primary long clauses: every field gets a number

dual binary clauses: at most one number per field

dual long clauses: every line contains all numbers

Thus we get 2^4=16 possible combinations, but not all combinations are valid:

Long clauses are needed, otherwise no field getting a number would be a solution.

And at least all primary or all dual clauses are needed.

So we have 3 + 3 + 1 = 7 valid possibilities.

How to call them? The primary clauses (only) has been called already "weak", only the dual clauses "dual weak", and primary and dual together "strong". Remain the 4 mixed cases.

Perhaps "weak pb", "weak pl", "dual weak pb" and "dual weak pl" ("p" for "plus") ?

Previous experimentations would suggest that "weak pl" is a good combination.
 Todo:
 Finding hard instances for boxdimension 3

A source for "hard" instances is [extreme Su Doku; Wayne Gould].

The covertext says "... Wayne Gould, who'll take you deep inside his
genius psyche.", and "..., he has been named by Time Magazine as one of
the 100 people who shaped the world.".

One sees that enlightenment is needed.

As one can see in ComputerAlgebra/Satisfiability/Lisp/Generators/docus/Sudoku.hpp, apparently most problems only need one r_2reduction, while perhaps the most difficult instance in that collection needs five r_2reductions (without backtracking, and thus also uniqueness is confirmed).

So likely this source is without further interest.

A better source is [Denis Berthier, The Hidden Logic of Sudoku, 2007]

See the collections at http://www.carva.org/denis.berthier .

This source only consider these "humanoid rules" (AInonsense).

Perhaps all these rules are covered by r_2 for the strong encoding?

In principle the alldifferent constraint can not be covered, but perhaps this doesn't happen for these small problems (boxdimension 3)?

We should search for problems which are as "hard" as possible.

See "Sampling of minimally uniquely satisfiable problems" in ComputerAlgebra/Satisfiability/Lisp/Generators/plans/Sudoku.hpp.

The "EasterMonster" (see Satisfiability/Lisp/Generators/docus/Sudoku.hpp) needs 9 nodes with the OKsolver_2002; this is the hardest example yet encountered.

The right approach should be to compute the thardness, phardness and whardness!

For uniquely satisfying solutions these hardnesses are given by the hardness of the unsatisfiable problems given by setting (just) one variable to a wrong value.

This needed to be computed for all known instances.

phardness 1 means that by directly applying the most basic rules all fields are filled.

phardness 2 and hardness 1 means that one needs to fill in one at least one false number, and then apply "basic" reasoning.

Note that in general a hardness value k is not just the worstcase, but also determines the bestcase for reasoning: by setting or forbidding values for fields (i.e., assigning a value to a variable) we need precisely r_{k1} to refute the false values and solve all fields (one after another).

See "Hardness of Sudoku problems" in Investigations/BooleanFunctions/plans/Hardness/general.hpp.

This should replace the following older discussion, based on solverruns.

There should be harder *unsatisfiable* problems.

While for the uniquely satisfiable problems actually it would be good if the OKsolver could enumerate all solution  since here this would yield the proof of uniqueness, and only this would be comparable to the rulebased approach, which also always shows uniqueness.

At least we need to extract the solution, add it negated to the conditions, and see what the complexity of the refutation is.

This refutation also contains the direct proof that the given total assignment is the unique solution.

Perhaps it is then enough to search for short (treelike) resolution refutations of the original sdkclauseset ("empty field") plus the negated (total) satisfying assignment.

The refutation of the "EasterMonster" with negated solution actually doesn't get any harder than finding the solution; see Satisfiability/Lisp/Generators/docus/Sudoku.hpp.

See below for the "human editing".

For the "human" solution of such problems, "explanations" are needed for the solution, which is just a demonstration of the sequence of forced assignments.

In [extreme Su Doku; Wayne Gould] the condition "without reductio ad
absurdum" is stated. Can this be made precise?

In [Denis Berthier The Hidden Logic of Sudoku, 2007] one finds a more systematic approach, and also an attempt at defining what "without
guessing" could mean.

Perhaps r_2reduction would not be eligible here, and perhaps one should use fullstrength bijectivityconstraints?

Of course, one can simulate r_2reduction via resolution, which is "forward reasoning"; so perhaps one should search for short resolution derivations of the forced assignments.

Actually, the last remark seems crucial, showing that all that talk about "trial and error" and so on is nonsense: Every forced assignment can be demonstrated by r_k for k big enough, and this "trialanderror" derivation can be translated into a "direct logical reasoning", namely a resolution refutation (of the unitclause corresponding to the forced assignment)!

One should do so for example for the above "EasterMonster"example, where [Denis Berthier] claims "no resolution rule is yet known to be
applicable to this puzzle".

He speaks of "rules", while the resolution proof is not a general rule; so well, perhaps some form of automated generalisation is possible?

The full refutation of the EasterMonster with negation is just 9 nodes (and 100 r_2reductions). Of course a lot of unitpropagations are involved. A good representation of this refutation would be valuable.
 Todo:
 Finding hard instances for boxdimension >3
 Todo:
 Tools for investigations with concrete Sudoku instances

cnf_puzzle
NAME
cnf_puzzle
SYNOPSIS
cnf_puzzle [OPTIONS] empty puzzle
OPTIONS
c FILE
Add the clauses in FILE to the output.
DESCRIPTION
cnf_puzzle outputs a Sudoku puzzle in Dimacscnfformat
according to the direct encoding. The file "empty" is a
cnffile in Dimacs format representing an empty Sudoku puzzle.
"puzzle" is a string representing a concrete instance of a
Sudoku puzzle. The puzzlestring contains 81 digits with
values between 0 and 9. 0 represents an empty cell. The
fixed fields of the puzzle are specified by nonzero values.
If position 1 <= i <= 81 contains nonzero value k then the final Sudoku
puzzle output has fixed field ((i1)/9+1, (i1)%9+1) with value k, where
/ is integer division and % is the modulus operator.
An optional file argument allows arbitrary other clauses to
be added to the output. (The intention being to allow the
negation of a satisfying assignment).

It would be good if optionally instead of the string "empty" the script could read from standard input.

Also good if there would be an option "strict", which only accepts the string as specified above, while otherwise spacesymbols in the string are ignored, and missing values at the end are automatically added as zeros.

Is "FILE" also in Dimacsformat? The point here is about the preamble, the initial comments plus the parameterline.

DONE (good enough for the beginning) Perhaps allowing arbitrary extra clauses to be appended to the output is a bad idea. Really we only want to allow precisely the negation of the (unique?) satisfying assignment?

DONE (good enough for the beginning) As above, the script is not especially flexible because it has to be called once per puzzle. Nicer would be to allow multiple concrete puzzles as input but then how to control the output so that different puzzles are output to different files?

Larger dimensions:

Possibly the script should be able to handle puzzles of larger dimension but then a different format for the puzzle input is needed.

We should have a look how at the various webpages these larger instances are specified.
Definition in file general.hpp.