general.hpp File Reference

On investigations into Green-Tao problems (including Green-Tao numbers) More...

Go to the source code of this file.

Detailed Description

On investigations into Green-Tao problems (including Green-Tao numbers)

Prepare benchmarks for SAT 2011
Standardisation on notation : DONE
  • For Green-Tao and for Van-der-Waerden numbers we use the following notation from now on.
  • The number of parts is the index of the function name ("greentao" or "vanderwaerden").
  • We always write out this index.
  • For diagonal problems, we have then exactly one argument, the length of the arithmetic progression in each part, and we add "d" to the function-name.
  • While for non-diagonal problems we have as many arguments as the index, each the length of the arithmetic progression in the corresponding part.
Trivial Green-Tao numbers : DONE (handling this elsewhere)
  • greentao_p(k) for natural numbers p,k >= 0 is the smallest natural number n >= 0 such that partitioning the first n prime numbers into p parts is guaranteed to contain an arithmetic progression of size k in some part.
    1. greentao_2(0) = 0
    2. greentao_2(1) = 1
    3. greentao_2(2) = 3
  • greentao_2(k1,k2) for k1,k2 >= 0 is the smallest natural number n >= 0 such that partitioning the first n prime numbers into 2 parts is guaranteed to contain either an an arithmetic progression of size k1 in the first part or an an arithmetic progression of size k2 in the second part.
    1. greentao_2(0,k) = 0
    2. greentao_2(1,k) is the smallest n such that the first n prime numbers contain an arithmetic progression of size k: see "Finding the first arithmetic progression" in Investigations/RamseyTheory/GreenTaoProblems/plans/AdditiveNumberTheory.hpp .
    3. greentao_2(2,k) is the smallest n such that after removing any prime from the first n prime numbers we always have an arithmetic progression of size k. See "The first arithmetic progression allowing a missing number" in Investigations/RamseyTheory/GreenTaoProblems/plans/AdditiveNumberTheory.hpp .
greentao_2(3) = 23
  • greentao_2(3) = 23 (partitioning the first 23 prime numbers into 2 parts, one part is guaranteed to contain an arithmetic progression of size 3, while using a smaller initial segment of prime numbers won't do).
  • Trivial (for OKsolver-2002, and likely for any solver).
DONE (the basic algorithm works not too bad now) Better creation of problems
  • For n in this magnitude the Maxima computation of the hypergraph is already very slow --- a more intelligent algorithm for finding the arithmetic progression amongst the prime numbers is needed (likely we cannot exploit the speciality of prime numbers, but we do it for arbitrary lists of numbers).
    1. DONE One could use memoisation in the form that for every n we store the additional hyperedges (k-progressions).
    2. DONE (doesn't improve a single computation, but several) This would it make rather quick, without imposing big memory burdens.
Best local search solver : DONE (there is no general best algorithm)
  • From the ubcsat-suite rnovelty+ seems best.
  • See what's in ubcsat-1.1.0.
  • And then we should try walksat from Kautz, which has several options (amongst them rnovelty+).
    1. Again, rnovelty+ seems far best.
Connecting different n
  • We should find out what the falsified clause for the above nearly satisfying assignment for n=33000 is --- if m is the maximum variable (index) in the clause then we have a satisfying assignment for n = m-1.
    1. This holds in general for such monotone sequences of clause-sets.
    2. We should write a little C++ program, which takes the assignment returned by Ubcsat (output by using option "-r best") and the clause-set, and outputs the falsified clauses.
Phase transition
  • One should study the density of the clause-sets (and the "threshold") here.
    1. The density 3.5 for unsatisfiable k=4 is somewhat similar to the random 3-SAT threshold (around 4.25 --- though for larger n).
    2. The density 8.8 for unsatisfiable k=4 is similar to the random 4-SAT threshold (around 9.8).
    3. The random 5-SAT threshold is around 20.
    4. One could guess that the unsatisfiability-density comes closer to the random-k-SAT threshold density?
    5. Then one needed to figure out how many k-progressions are in the first n primes; see "The distribution of arithmetic progression amongst primes" in Experimentation/Investigations/plans/AdditiveNumberTheory.hpp.
    6. It would be interesting to study random complement-invariant k-SAT clause-sets (choose a random k-clause-set, and take the union with the complement)!
    7. I (OK) would assume that the van-der-Waerden clause-sets are much more redundant than the Gree-Tao clause-sets, and that the latter are much closer to random clause-sets.
    8. A general conjecture is that we have the Ramsey property if the density goes to infinity for each fixed size k of the structure required to exist (arithmetic progressions, cliques, etc.; in this general form likely one should find a counter-example, but perhaps it holds if the the structures "spread a bit").
Faster generation of arithmetic progression of primes
  • A major bottleneck is the time needed to create Green-Tao problems.
  • Via local search we might even investigate greentao_2(6), but here n might go into the millions, and we need a much faster generator.
  • In Satisfiability/Transformers/Generators/GTSat and GTdSat we have C++ generators for boolean problems and for the direct translation for non-boolean problems.
  • And also the sequences length(arithprog_primes_finish[k,n]) for fixed k and length(arithprog_primes(k,n)) for fixed k should be of interest.
    1. Shall this go into a PostgreSQL database, or into a simple file, containing lines "no., prime, count of sequences ending with prime, cumulative count". ? The file looks alright (and can be easily expanded).
    2. We should also provide column headings, so that it can be directly read into R.
    3. But also Maxima should have no problems reading these files.
    4. These files need to be provided in a data section of the OKlibrary.
Faster local search
  • For greentao it seems the only structure which can be exploited is the complement-invariance.
    1. More precisely, we have a hypergraph colouring problem.
    2. So we have complement-invariant clause-sets, and furthermore we have a PN-clause-set (so regarding space, we can save one half of the clauses, and the clauses need only contain positive literals, i.e., variables, not literals).
    3. It seems desirable to have a specialised local search solver for hypergraph colouring (as in instance of generalised SAT); since local search solvers only use total assignments, the non-stability of hypergraph colouring under partial assignments is no hindrance (while we have stability under autarkies).
    4. The more colours are to be used, the bigger the savings.
    5. And one would assume the various heuristics are influenced by the restriction to hypergraph colouring.
    6. First we should only implement adaptnovelty+ and rnovelty+, and this first at the Maxima/Lisp level, followed directly by a C++ implementation; see ComputerAlgebra/Satisfiability/Lisp/LocalSearch/plans/HypergraphColouring.hpp.
  • For vanderwaerden there is much more structure which could be exploited (using "virtual" clause-sets).
  • We should try to understand why the different local search algorithms behave so differently on the various problem classes.
    1. See chapter 6 in [Hoos, Stuetzle, Stochastic Local Search] for background information on the algorithms involved.
    2. For the van der Waerden problems and the Green-Tao problems it should be possible to gain quite good quantitative experimental understanding.
    3. See chapter 4 in [Hoos, Stuetzle, Stochastic Local Search] for material on statistical evaluation.
  • "Meta-heuristics":
    1. General meta-heuristics are needed, which can be adapted to the specific problems.
    2. A natural first example would be first to identify the best solver from the suite, then trying to optimise it, and then search for solutions by starting with, say (just an example) 1000 seeds, running them a bit, filtering out the 100 most promising ones, running them further, filtering out the 10 best, running them, finally filtering out the best one (or more --- depending on the number of processes to be run!).
    3. Of course, this all automatic (with good monitoring).
    4. One needs to gain quantitative understanding of the local search process, so that progress can be evaluated; see above.
    5. All algorithms and programs are written in a natural generative style, but specific to the problem set (van der Waerden and Green-Tao problems here --- even them treated individually).
    6. Perhaps the whole thing is written in R first, using Ubcsat; see ExperimentSystem/ControllingLocalSearch/plans/general.hpp.
    7. And (of course) also at the Maxima/Lisp level, this time using Maxima/Lisp local search algorithms; see ComputerAlgebra/Satisfiability/Lisp/LocalSearch/plans/general.hpp.
Survey propagation
  • To search for literature, we can search on the Internet for the sequence (1,3,23,512) (greentao_2(i) for i=1,2,3,4).
  • Likely this sequence is not in that Internet database, and we should submit it (once our article has appeared; or perhaps the report is enough).
  • Actually there is sequence A134050: 1, 1, 3, 23, 512, 34939 ?!?
  • For index 0 this sequence has value 1, while we have value 0; it seems unbelievable that this rather easily calculated sequence should coincide with greentao_2(i).
  • We can also investigate greentao_2(i,i+1) fuer i >= 1; the values we know are
    1. greentao_2(1,2) = 2
    2. greentao_2(2,3) = 7
    3. greentao_2(3,4) = 79
    There is A128293 with first five values 1, 2, 7, 79, 4235 ?!?
  • Again, for i=0 the first value for us would be 0; and again it seems unbelievable that this sequence should have something to do with ours.

Definition in file general.hpp.