Tools for guessing the first point of unsatisfiability in a series of problems (with one turning point)
More...
Go to the source code of this file.
Detailed Description
Tools for guessing the first point of unsatisfiability in a series of problems (with one turning point)
 Todo:
 Connections
 Todo:
 Finding the n where a problem series changes from SAT to UNSAT

For problems coming from Ramsey theory we have given a CNFgenerator G(n), where n is a natural number and where for some (unknown) n_0 it is known that G(n') is unsatisfiable iff n' >= n_0.

We assume computation of G(n) is fast (a few seconds, say); so typically a C++ program is needed.

Now via local search an "educated guess" for n_0 is sought (that is a lower bound n_0' <= n_0 (where the problem is known to be satisfiable), where n_0'+1 seems unsatisfiable.

The algorithm alg from the ubcsatsuite is given (has been selected before).

We always use option "solve" to stop once a solution was found.

A reasonable strategy seems to be the following ("big steps, followed by scanning with small steps, followed by systematic exploration (minimum steps)"):

Start with some n for which n <= n_0 is known.

For establishing n the algorithm starts with a, b, where a is a known lower bound, while b is assumed to be an upper bound.

Then via bisection we find n, where not too much effort is spent for upper bounds.

Alternatively we compute s = (ba) / G ("G" like "groups"), and consider a+s, a+2s, ... . Perhaps this is better, since so we don't have to deal with possibly hard unsatisfiable problems.

One starts with cutoff=10^4, and uses runs=10, increasing cutoff using a factor of 2 if necessary, until an upper bound for cutoff is reached, say, 10^7 (of course, all these numbers are parameters to the algorithm, with these default values).

Perhaps G=100 is a reasonable default value.

For the upper bound b one assumes that 10 runs with cutoff=10^8 only reaches a minimum of, say, 10.

Also for establishing b we should use some algorithm, which uses bigger steps, say here s=1000, and a fixed cutoff=10^8.

The procedure will now increase n in steps of 1: Given that unsatisfiable problems in this area are very hard, and via local search we don't have a reasonable grasp on that, this seems best (avoiding to jump into unknown areas).

First an appropriate cutoff is computed:

The start cutoff is cutoff_start as given by the above computation.

t runs of ubcsat::alg are performed:

If a solution was found, then we go to the next n.

If the frequency of runs with min=1 is less then p, then cutoff is multiplied by cfactor, and again t runs are performed.

Otherwise the current cutoff is considered appropriate.

For t a value t=10 seems reasonable.

For p a value p=0.2 (i.e., 20%) seems alright.

For cfactor currently I (OK) use cfactor=10; cfactor=2 seems ineffective, but perhaps cfactor=5 is a good compromise.

If cutoff becomes larger than cbound, the whole process is stopped (and the current n is taken as a guess for n_0).

Requiring that min=1 must occur, and this with a certain frequency, is based on the experiences with adaptnovelty+, which has a quite small volatility; we could generalise this to ask for runs with min <= min_required, where for more volatile algorithms one could use up to min_required = 5.

Now the main search is performed:

M "meta"runs of ubcsat with M' runs each are performed.

If a solution was found then n is increased.

Otherwise cutoff is multiplied with cfactor, and we repeat.

The devision of the M*M' ubcsatruns into M packages is there so that for data evaluation one doesn't need to wait until the end of the ubcsatrun; but M shouldn't be too small, since reading the formula takes time.

But in order to make this happen, actually the ubcsatruns have to happen in the background, only informing R about completion of one package  is this possible?

M=10 and M'=100 seems reasonable.

Instead of increasing the cutoff, one could also increase M: perhaps if the average of osteps for the runs achieving the minimum is less then q*cutoff, then multiplying M by rfactor is sensible, where q = 0.4, and rfactor = 2.

The upper bound for M should perhaps be 80: If this is reached, then M is reset to the start value, cutoff is multiplied by cfactor, and the main search is continued.

Everything is logged to files.

What is a suitable programming language here?

Since here a lot of variation and further development is needed, perhaps we already start with a C++ program in objectoriented style (mainly).

But all external calls are handled by wrapper scripts, which make handling of the various inputs and outputs very easy for the C++ program.

On the other hand, perhaps R would be easier?

Maxima is also possible; but we have only "system" available, without getting at the return value, and we also have (big) problems with the initialisation, while with R we can set HOME appropriately, so either R or C++.

Perhaps R is most suitable here; also for later developments we might use more sophisticated statistical evaluations.

Though C++ feels better, and at this time there are no complicated statistics involved.

On the underlying assumptions:

"Most" problems seems to have just one "centre" of the mindistribution, and the task is to "push hard" to move this centre towards min=0.

Of course, using a low cutoffvalue, one will see a high minaverage, and a few outliers with much lower minvalue, however for "most" cases it seems not effective to go into this direction, that is, using many runs with relatively low cutoffvalues.

One exception is vanderwaerden_2(4,9), where apparently all ubcsatalgorithm show "two centres", the main one with high minaverage, and the second one with average close to zero. Here it seems ineffective to increase the cutoffvalue to force the two centres to merge. See RamseyTheory/VanderWaerdenProblems/plans/VanderWaerden_24k.hpp.
 Todo:
 Simpler strategies

We should start with several scripts, each just implementing one simple strategy.

Simplest: just use fixed cutoff and numberofrounds:

There are five inputparameter:

n, the start value for the generator

cutoff

number of rounds

ubcsatalgorithm

script for generating dimacsfiles.

The algorithm starts with n, runs ubcsat, if it finds a solution then it is recorded and n increased, otherwise it stops.

A directory is created by the script, which contains all data, in such a form that if interrupted, just restarting the script will continue (nearly) where it was interrupted.

The generationscript takes n as input and creates the dimacsfile "current.cnf" in the output directory.

Overall architecture:

In principle we have the three stages "preparation", "running", "transfer".

Preparation creates the experiment directory.

Running just means calling the runexperimentprogram with the experiment directory as parameter (first time or restart doesn't matter).

Perhaps regarding "transfer" yet we do nothing; later we might transfer the data into a database.

Monitoring the state of the process:

For monitoring purpose there should be one monitoring file, called "monitor", in which the current activity and the past results are recorded; so just monitoring the last 10 lines of this file should suffice to get the basic information.

The past result are just lines containing n, seed, osteps for each solution found, and otherwise just a message that no solution was found.

At the beginning of "monitoring" all metainformation is given.

To detect that the process was interrupted needs some global control.

Though to have some simple activity signal would be nice.

Simplest is just to append all ubcsatoutput in file "output", so that monitoring that file shows what is going on.

Likely no need to have a more finegrained recording.

However we want the possibility for Revaluation. So "output" must be Rreadable; perhaps just prefixing all ubcsatokl output with n, and not having the satisfying assignment or the other ubcsatoutput, should suffice.

Detection of the result of a ubcsatinvocation:

Best of course if Ubcsat would return a Dimacsreturncode.

Contact the ubcsatgroup how to do this best.

Data stored in the experimentdirectory:

Likely just the parameters stored, each in its own file, is sufficient.

The file containing n would be updated after a run which found a solution. So restarting just repeats the current n.

A first simple example, with hardcoded generator, is RunVdW3k in Experimentation/Investigations/RamseyTheory/VanderWaerdenProblems/.

Find the first point where finding a satisfying assignment isn't that easy anymore:

Motivated by Experimentation/Investigations/RamseyTheory/HindmanProblems/plans/Hindman_2^a(2).hpp where ubcsat::rsaps behaves very predictably.

Here the strategy would be just to use a fixed cutoff, and 10 rounds, and to determine the transition point.

If mixed cases occur (sat and unsat), then the program simply stops.

So here we do not stop once a solution was found.
Definition in file PointOfUnsatisfiability.hpp.