general.hpp File Reference

Launching and monitoring experiments on remote machines. More...

Go to the source code of this file.


namespace  OKlib::ExperimentSystem

Components for performing experiments.

namespace  OKlib

All components of the OKlibrary.

Detailed Description

Launching and monitoring experiments on remote machines.

Update namespaces
  • The script "DataCollection.R" in Experimentation/ExperimentSystem is misnamed, since it considers only some very special form of experiments.
  • It is also misplaced, as a special tool.
  • A directory ExperimentSystem/HelpingInvestigations shall be created, with subdirectories which have the same names of those in Experimentation/Investigations, and where special special R-scripts helping these investigations are placed.
  • ExperimentSystem/DataCollection.R shall be moved there, with all appropriate updates.
  • At this time then a proper specification of the functionality in ExperimentSystem/DataCollection.R (currently) is needed.
Simple script for monitoring remote processes
  • See "Translation via addition" in RamseyTheory/VanderWaerdenProblems/Transversals/plans/UsingSAT.hpp for first plans.
  • See "Adding distribution power to SplittingViaOKsolver" in ExperimentSystem/plans/DistributedSolving.hpp for a concrete project.
  • See "Launching and monitoring" for more specific requirements.
  • Such a script perhaps visits every hour each process, restarts it if necessary, and performs also some output-action, stored to some file dedicated to the experiment.
  • So that in most cases one just needs to inspect local files, and, more important, is sure that the experiment runs continuously.
  • In case the process can't be restarted an e-mail is sent to the administrator.
  • It seems easiest just to write the various outputs into files; one could also think of adding a line to a table in a html document or an rss feed, but I (OK) definitely prefer simple files.
  • The experiments are stored via a simple format in a file, which can be arbitrary changed (but one should make sure that the hourly action is not happening just then, so only working with a copy of the configuration file, and overwriting the old only at the end).
  • A configuration line could contain the command to login, the command to check whether the process is running, the command to re-start the process, and the command for producing output.
  • How can we just find out whether *new* output happened, and only show this?
  • Using "ps" for checking the status of a process seems appropriate; but one should not just check the pid (another process could have attained the same pid).
  • First step is to transfer the experiment-system from OKgenerator (see directory ExperimentSystem/RandomGenerator) to here.
  • Compilation is not an issue yet, but renaming and initial documentation.
Improve naming and documentation of rows2column_df
  • "rows2columns_df" in DataCollection.R was a quickly written function to allow one to aggregate data from various rows in a data.frame. Certain columns remain constant, and another "key" column ranges over a fixed set, associating with values in a "value" column. The rows are then aggregated such that new columns are introduced for each key, and there is a row for each unique set of values in the constant column, where each column in the row is given values from the "value" column associated with each key.
  • The above needs to be better written and placed in the documentation.
  • The name and placement of the function also needs to be considered.
Working again
  • Make the tools compiling and running correctly.
Necessary extensions (improvements) of the old experiment-system
  • Descriptors:
    1. It must be possible to process "ad-hoc experiments"; the simplest way to do so is to allow as input a directory-name, and then all files in there will be processed.
    2. Additional parameters specifying the experiment like the abortion-time.
  • Database (see module OKDatabase) must be extended to administer also ad-hoc experiments.
Launching and monitoring
  • The launch-and-monitor system follows the old 3 steps (preparation, processing, transfer), but now all these steps can be launched automatically.
  • So a database of available machines is needed (access via ssh).
  • When starting processing an experiment, the pid is extracted, so that then the output of the ps-command for this process can be shown.
  • Also "ls -l" for the experiment-directory and "less" for the files in them must be supported.
  • For launching a new experiment (always on a fixed machine; we simply do not support transferring experiments to different machines, but it must be easy to abort an experiment and transfer the data obtained so far) one can ask for the available machines (showing a status report about availability, how many experiments are currently running on it, how many processors the machine has, bogomips etc.) and then, via some command perhaps using a syntax like chmod for "prepare yes/no, process yes/no, transfer yes/no", say, +-+, an experiment can be started.
  • Perhaps we use simple single tools for the different actions (querying the status, launching an experiment), and use simple copy-and-paste to transfer data from one step to another.
  • See "Simply script for monitoring remote processes" above for a simple but flexible tool which just monitors (arbitrary) processes.
  • "autoson" by McKay http://cs.anu.edu.au/~bdm/autoson/ :
    1. The licence situation is unclear: We can only use software which can be freely redistributed; if it is enough just to keep the package, then this should be alright.
    2. A serious restrictions seems to be the common filesystem: So apparently actually the software is not usable for us, since in most situations machines are just connected by ssh ??
    3. In most cases, such as on PCs in the CS linux lab, the machines have at least the user's home directory in common (via NFS etc). Also, there are always solutions such as sshfs (see http://fuse.sourceforge.net/sshfs.html ), although admittedly, these add further complexity.
  • BOINC http://boinc.berkeley.edu/ :
    • Documentation available at http://boinc.berkeley.edu/trac/wiki/ProjectMain .
    • BOINC is intended for grid computing and could be useful for small experiments on university computers, but then could be easily expanded to allow volunteers to offer computing resources for larger experiments.
    • BOINC also offers a wrapper script, so arbitrary applications can be run using it's system rather than just custom BOINC-specific applications (although applications that communicate directly with the BOINC system may be able to store more information).
  • Condor http://www.cs.wisc.edu/condor/ :
    • This might be the right tool, with the needed flexibility.
    • OK has contacted condor-admin@cs.wisc.edu to explore whether it can handle our standard approach (lanch/start+restart/submit).
  • The tests must be written in such a fashion, that they are largely testable without accessing ssh; for testing the parts essentially using ssh one can then simply use the host machine of the testing process.

Definition in file general.hpp.