On investigations into GreenTao problems (including GreenTao numbers)
More...
Go to the source code of this file.
Detailed Description
On investigations into GreenTao problems (including GreenTao numbers)
 Todo:
 Prepare benchmarks for SAT 2011
 Todo:
 Standardisation on notation : DONE

For GreenTao and for VanderWaerden numbers we use the following notation from now on.

The number of parts is the index of the function name ("greentao" or "vanderwaerden").

We always write out this index.

For diagonal problems, we have then exactly one argument, the length of the arithmetic progression in each part, and we add "d" to the functionname.

While for nondiagonal problems we have as many arguments as the index, each the length of the arithmetic progression in the corresponding part.
 Todo:
 Trivial GreenTao numbers : DONE (handling this elsewhere)

greentao_p(k) for natural numbers p,k >= 0 is the smallest natural number n >= 0 such that partitioning the first n prime numbers into p parts is guaranteed to contain an arithmetic progression of size k in some part.

greentao_2(0) = 0

greentao_2(1) = 1

greentao_2(2) = 3

greentao_2(k1,k2) for k1,k2 >= 0 is the smallest natural number n >= 0 such that partitioning the first n prime numbers into 2 parts is guaranteed to contain either an an arithmetic progression of size k1 in the first part or an an arithmetic progression of size k2 in the second part.

greentao_2(0,k) = 0

greentao_2(1,k) is the smallest n such that the first n prime numbers contain an arithmetic progression of size k: see "Finding the first
arithmetic progression" in Investigations/RamseyTheory/GreenTaoProblems/plans/AdditiveNumberTheory.hpp .

greentao_2(2,k) is the smallest n such that after removing any prime from the first n prime numbers we always have an arithmetic progression of size k. See "The first arithmetic progression allowing a
missing number" in Investigations/RamseyTheory/GreenTaoProblems/plans/AdditiveNumberTheory.hpp .
 Todo:
 greentao_2(3) = 23

greentao_2(3) = 23 (partitioning the first 23 prime numbers into 2 parts, one part is guaranteed to contain an arithmetic progression of size 3, while using a smaller initial segment of prime numbers won't do).

Trivial (for OKsolver2002, and likely for any solver).
 Todo:
 DONE (the basic algorithm works not too bad now) Better creation of problems

For n in this magnitude the Maxima computation of the hypergraph is already very slow  a more intelligent algorithm for finding the arithmetic progression amongst the prime numbers is needed (likely we cannot exploit the speciality of prime numbers, but we do it for arbitrary lists of numbers).

DONE One could use memoisation in the form that for every n we store the additional hyperedges (kprogressions).

DONE (doesn't improve a single computation, but several) This would it make rather quick, without imposing big memory burdens.
 Todo:
 Best local search solver : DONE (there is no general best algorithm)

From the ubcsatsuite rnovelty+ seems best.

See what's in ubcsat1.1.0.

And then we should try walksat from Kautz, which has several options (amongst them rnovelty+).

Again, rnovelty+ seems far best.
 Todo:
 Connecting different n

We should find out what the falsified clause for the above nearly satisfying assignment for n=33000 is  if m is the maximum variable (index) in the clause then we have a satisfying assignment for n = m1.

This holds in general for such monotone sequences of clausesets.

We should write a little C++ program, which takes the assignment returned by Ubcsat (output by using option "r best") and the clauseset, and outputs the falsified clauses.
 Todo:
 Phase transition

One should study the density of the clausesets (and the "threshold") here.

The density 3.5 for unsatisfiable k=4 is somewhat similar to the random 3SAT threshold (around 4.25  though for larger n).

The density 8.8 for unsatisfiable k=4 is similar to the random 4SAT threshold (around 9.8).

The random 5SAT threshold is around 20.

One could guess that the unsatisfiabilitydensity comes closer to the randomkSAT threshold density?

Then one needed to figure out how many kprogressions are in the first n primes; see "The distribution of arithmetic progression amongst
primes" in Experimentation/Investigations/plans/AdditiveNumberTheory.hpp.

It would be interesting to study random complementinvariant kSAT clausesets (choose a random kclauseset, and take the union with the complement)!

I (OK) would assume that the vanderWaerden clausesets are much more redundant than the GreeTao clausesets, and that the latter are much closer to random clausesets.

A general conjecture is that we have the Ramsey property if the density goes to infinity for each fixed size k of the structure required to exist (arithmetic progressions, cliques, etc.; in this general form likely one should find a counterexample, but perhaps it holds if the the structures "spread a bit").
 Todo:
 Faster generation of arithmetic progression of primes

A major bottleneck is the time needed to create GreenTao problems.

Via local search we might even investigate greentao_2(6), but here n might go into the millions, and we need a much faster generator.

In Satisfiability/Transformers/Generators/GTSat and GTdSat we have C++ generators for boolean problems and for the direct translation for nonboolean problems.

And also the sequences length(arithprog_primes_finish[k,n]) for fixed k and length(arithprog_primes(k,n)) for fixed k should be of interest.

Shall this go into a PostgreSQL database, or into a simple file, containing lines "no., prime, count of sequences ending with prime, cumulative count". ? The file looks alright (and can be easily expanded).

We should also provide column headings, so that it can be directly read into R.

But also Maxima should have no problems reading these files.

These files need to be provided in a data section of the OKlibrary.
 Todo:
 Faster local search

For greentao it seems the only structure which can be exploited is the complementinvariance.

More precisely, we have a hypergraph colouring problem.

So we have complementinvariant clausesets, and furthermore we have a PNclauseset (so regarding space, we can save one half of the clauses, and the clauses need only contain positive literals, i.e., variables, not literals).

It seems desirable to have a specialised local search solver for hypergraph colouring (as in instance of generalised SAT); since local search solvers only use total assignments, the nonstability of hypergraph colouring under partial assignments is no hindrance (while we have stability under autarkies).

The more colours are to be used, the bigger the savings.

And one would assume the various heuristics are influenced by the restriction to hypergraph colouring.

First we should only implement adaptnovelty+ and rnovelty+, and this first at the Maxima/Lisp level, followed directly by a C++ implementation; see ComputerAlgebra/Satisfiability/Lisp/LocalSearch/plans/HypergraphColouring.hpp.

For vanderwaerden there is much more structure which could be exploited (using "virtual" clausesets).

We should try to understand why the different local search algorithms behave so differently on the various problem classes.

See chapter 6 in [Hoos, Stuetzle, Stochastic Local Search] for background information on the algorithms involved.

For the van der Waerden problems and the GreenTao problems it should be possible to gain quite good quantitative experimental understanding.

See chapter 4 in [Hoos, Stuetzle, Stochastic Local Search] for material on statistical evaluation.

"Metaheuristics":

General metaheuristics are needed, which can be adapted to the specific problems.

A natural first example would be first to identify the best solver from the suite, then trying to optimise it, and then search for solutions by starting with, say (just an example) 1000 seeds, running them a bit, filtering out the 100 most promising ones, running them further, filtering out the 10 best, running them, finally filtering out the best one (or more  depending on the number of processes to be run!).

Of course, this all automatic (with good monitoring).

One needs to gain quantitative understanding of the local search process, so that progress can be evaluated; see above.

All algorithms and programs are written in a natural generative style, but specific to the problem set (van der Waerden and GreenTao problems here  even them treated individually).

Perhaps the whole thing is written in R first, using Ubcsat; see ExperimentSystem/ControllingLocalSearch/plans/general.hpp.

And (of course) also at the Maxima/Lisp level, this time using Maxima/Lisp local search algorithms; see ComputerAlgebra/Satisfiability/Lisp/LocalSearch/plans/general.hpp.
 Todo:
 Survey propagation
 Todo:
 Literature

To search for literature, we can search on the Internet for the sequence (1,3,23,512) (greentao_2(i) for i=1,2,3,4).

Likely this sequence is not in that Internet database, and we should submit it (once our article has appeared; or perhaps the report is enough).

Actually there is sequence A134050: 1, 1, 3, 23, 512, 34939 ?!?

For index 0 this sequence has value 1, while we have value 0; it seems unbelievable that this rather easily calculated sequence should coincide with greentao_2(i).

We can also investigate greentao_2(i,i+1) fuer i >= 1; the values we know are

greentao_2(1,2) = 2

greentao_2(2,3) = 7

greentao_2(3,4) = 79
There is A128293 with first five values 1, 2, 7, 79, 4235 ?!?

Again, for i=0 the first value for us would be 0; and again it seems unbelievable that this sequence should have something to do with ours.
Definition in file general.hpp.