OKlibrary  0.2.1.6
DataCollection.hpp File Reference

Tools for getting data from local search algorithms More...

Go to the source code of this file.


Detailed Description

Tools for getting data from local search algorithms

Especially we consider running Ubcsat, while tools are often written in R.

Todo:
run_ubcsat as shell script
  • run_ubcsat should not be an R-function, but a bash script.
  • R is for statistical evaluation, bash for running programs.
  • It might nevertheless be sensible, since we have run_ubcsat, to finish it first, and only then to replace it, in order to have a completed tool which can act as reference.
  • On the other hand, writing the bash script should be straightforward, using our experience with writing such scripts. So run_ubcsat should not be further developed, but replaced by the bash script.
  • There is somewhere a todo for this; we need to connect/update this.
  • Of course, all todos below must be updated and applied to this new situation.
  • The bash-script must have all (good) features currently run_ubcsat has; this includes the form of monitoring.
  • The name could be "ExpRunUbcsat"; compare "Running experiments" in ExperimentSystem/SolverMonitoring/plans/general.hpp.
  • Different from there however here we run many solvers on one instance; perhaps this should be reflected in the name.
Todo:
Input checking
  • Before running, run_ubcsat needs to check all parameter.
  • For example the algorithm list needs to be checked, whether all names are correct.
Todo:
Bad output should not be incorporated into the dataframe
  • If some algorithm could not be evaluated, then no data should be collected into the dataframe.
  • So that eval_ubcsat_dataframe runs correctly, just on the algorithms where it worked.
  • This is currently not the case, when false algorithm-names were used.
Bug:
Incomplete data collection
  • run_ubcsat, as every such tool, must collect *all* the data available from ubcsat.
  • This shall be achieved by using "fubcsat_okl"; see "Better output" in ExperimentSystem/ControllingLocalSearch/plans/Ubcsat-okl.hpp.
  • Also the time needs to be recorded.
    1. However, adding "time" to the list of parameters to be reported yields always "0.000000" ?
    2. Apparently this is for "rtd" only.
    3. So it seems not available? Ask on the Ubcsat mailing list.
    4. It seems, some form of runtime-info is only available through the statistics, namely "fps" (flips per second), and "totaltime".
    5. We wait for version 2.0.
Todo:
Bad documentation of run_ubcsat
  • Every documentation must be concise and to the point.
  • Instead here we find pages of irrelevant data.
Todo:
Better output of run_ubcsat
  • A paramater-file is needed, so that the experiment can be reproduced.
  • DONE So a summary for each (single) run should be printed out.
    1. DONE We need the tabulation of the min-values.
    2. DONE Perhaps it's better not to have an empty line between the playback of the command and the min-tabulation.
    3. DONE Since there are big differences in running times, we also need timing information.
  • The result-files should be directly readable by read_ubcsat, and thus they should include the leading row with the column names.
  • DONE (a warning is shown for any errors for each algorithm now and a summary of all algorithms with warnings is given at the end) Segmentation faults should be very visible (currently they aren't). And they should be summarised at the end of all runs.
  • DONE The directory should have a timestamp; compare RunVdW3k.
  • DONE (now prints algorithm name and summary of min column) An obvious problem with run_ubcsat is that it doesn't give intermediate results: runs in general take quite a time, and one needs to wait until the end.
Todo:
Add new-ubcsat-okl as an option for run_ubcsat
  • DONE (the new version is now the default) We will likely need to run experiments using new-ubcsat-okl before ubcsat-1-2-0 is released, but ubcsat-okl segfaults in various ways using weighted algorithms.
  • DONE (new version is now the default) As we need weighted algorithms, for example in the case of minimising CNF representations (see Investigations/Cryptography/AdvancedEncryptionStandard/plans/Representations/BoxMinimisation.hpp) we need an option to specify that we wish to use "new-ubcsat-okl" instead of "ubcsat-okl".
  • We also need to ensure that all the weighted versions of each algorithm are listed in the algorithms list, as well as new versions.
  • DONE (use ubcsat_wrapper = "old-ubcsat-okl") For some time we still want to be able to use (conveniently) version 1-0-0.
Todo:
Make run_ubcsat interruptible
  • Since it takes a long time to finish a computation, it should be possible to stop the current computation and just use the results obtained so far.
  • No documentation exists on this issue: is this already possible? Or are certain clean-up steps required?
  • It would also be needed to be able to complete a computation later:
    1. First the currently processed algorithm needs to be represented in a file, so that a continuation just can pick up where the computation was aborted.
    2. Perhaps manual deletion of the files related to the currently processed algorithm is needed: For that it must be clear which files are these.
  • Perhaps for these things we wait for the new version 2.0 of ubcsat, since then our ubcsat-tools need to be rewritten anyway, and at this time then perhaps the pure running-experiments-functionality of run_ubcsat is handled by a shell-script.
Bug:
Bad columns produced by run_ubcsat
  • "Clauses", "Variables" and other constant measures should not show up in such dataframes.
  • For the data which is independent of the algorithm, a second dataframe should be returned.
  • These names are also inappropriate (see the general standards in the library).
  • Actually, three dataframes are needed:
    1. One with data regarding only the instance.
    2. One with statistics regarding the algorithms (not the runs), e.g., fps (flips per second).
    3. One containing all run-information.
  • DONE (see "Incomplete data collection") There is no need to have more or less of these parameters --- we need them all, in suitable packaging!
  • DONE The point of ubcsat_eval and such tools must be to give convenient access to *all* the data.
  • How to represent algorithms:
    1. And what is the type of the algorithm-column? Shouldn't it be a factor*, with values given by strings? In any case, its use must be documented.
    2. Access to the factor levels should be possible through the variable run_ubcsat_cnf_algs, however this is not possible.
    3. The "nature" of dataframes needs to be investigated.
    4. At present the "algs" column is a factor, even those the algorithm is given to the dataframe as a string. This is due to the "stringsAsFactors" of the data.frame constructor.
    5. By default any string columns in a data.frame are converted to factors. To keep these strings as strings, we should set "stringsAsFactors" to FALSE.
  • DONE The column-names should be identical to the names used by ubcsat (in the output!).
  • So "found -> sat", "best -> min", "beststep -> osteps", "steps -> msteps". And references to these columns must be replaced in all files (typically these references use e.g. E$best).
Bug:
Missing evaluation tools
Todo:
Parallelisation
  • There should be an additional parameter "threads", with default-value 1, which specifies the number of threads used.
  • The runs of the algorithms are then simply distributed over the spreads.
  • This should be a fairly easy task.
  • And it would be useful, since the running time can be substantial, while one wants to the results as quick as possible, to start the real experiments.

Definition in file DataCollection.hpp.