general.hpp File Reference

General plans regarding the logical analysis of data. More...

Go to the source code of this file.

Detailed Description

General plans regarding the logical analysis of data.

  • We need various functions for handling matrices, which generalise the "truth tables" found in the literature.
  • Basis is the concept of a combinatorial matrix (see ComputerAlgebra/CombinatorialMatrices/Lisp/Basics.mac).
  • So a "data frame" is a combinatorial matrix M, where M[1] (the set of row names) yields the set of "cases", M[2] (the set of column names) yields the sets of "variables", and where the values are column-specific (that is, every variables has a domain).
  • This seems to be best understood as a clause-variable matrix, where variables can be boolean or non-boolean.
    1. For the boolean case the values are -1,0,+1, and the translations from matrices to clause-sets and back are given by clvar_com2fcs and clvar_fcs2com (where for the former actually any values which can be positive, negative or zero are allowed).
    2. In QCA (yet) only "full configurations" (corresponding to full clauses) are considered, and encoded via 0 and 1; the translation function
      tt2cvm(M) := m2com(subst(-1,0,M))$
      is to be used.
    3. Displaying matrices in the usual style (for inspection) is done via using disp2d.
    4. Considering the example of Table 4 (page 44) in [Schneider, Wagemann, 2007], we get the "session"
      SW4b : matrix([0,0,0,1],[0,0,1,1],[0,1,0,1],[0,1,1,1],[1,0,0,0],[1,0,1,1],[1,1,0,0],[1,1,1,0])$
      SW4 : tt2cvm(SW4b);
      SW4F : clvar_com2fcs(SW4);
      SW10b : SW4b;
      SW10b[4,4] : 0;
      SW10b[6,4] : 0;
      SW10 : tt2cvm(SW10b);
      SW10F : clvar_com2fcs(SW10);
    5. For the non-boolean case we can have uniform or function domains (as in ComputerAlgebra/Satisfiability/Lisp/ClauseSets/NonBoolean.mac).
    6. So here we should again have just (combinatorial) matrices, and (combinatorial) matrices with domains (as we have it for nb-clause-sets).
  • Perhaps we could use combinatorial matrices over {0,1}:
    1. The row-names would then be the case-names.
    2. While the column-names would be the variable-names.
    3. The data-matrix from [The Outcomes of Homeless Mobilization; Cress, Snow, The American Journal of Sociology, 2000], Table 3 would then be available as follows:
      CS : mrc2ocom(matrix(
       ["PUH", "AOS", "OUH", "TUH", "PUEJ", "DtUH", "HCRP", "BUH", "DnUH", "HF", "HUH", "HU", "MUH", "HPU", "MC"],
       ["Via", "DisT", "SymA", "CSup", "DiagF", "ProgF", "Rep", "Res", "Ri", "Re"])$
      CS[3]("TUH", "SymA");
    4. Via
      ttcom2cvm(M) := subst(-1,0,M)$
      CSc : ttcom2cvm(CS);
      we obtain a "clause-variable" matrix. To obtain a clause-set, we need to wrap the variable-strings ("Via" etc.) by a variable-producing function, for example the generic wrapper "gv":
      FF : clvar_w_ocom2fcl(CSc, gv);
    5. Though in principle it seems that using such variables is reasonable; only perhaps we typically avoid the distinction between small and capital letters, just using only small letters.
    6. An alternative would be to introduce dedicated variables (like Via), however then we would get easily naming-conflicts.
    7. (DONE Bug fixed in Maxima 5.21.1. See "Strings cause errors in evaluation of expressions" in ComputerAlgebra/plans/Maxima.hpp) When wrapping variable names using a generic variable wrapper we get
      FF : clvar_w_ocom2fcl(CSc, gv);
      The error is due to the Maxima-bug when handling strings; so currently we can't have variables like gv("Via").
Representing partial boolean functions
  • Given a set V of variables, a partial boolean function f could be represented by a pair [T,F], where T,F are disjoint sets of total assignments over V, representing the true resp. false points of f.
  • However, falsifying assignments are the domain of CNF, not DNF, and so the proper generalisation, allowing arbitrary clause-sets T,F over V, seems to be a triple [V,T,F], such that T,F are clause-sets over V, and such that the satisfying assignment of T as DNF are disjoint from the falsifying assignments of F as CNF.
  • So T directly represents the sufficient conditions, while F directly represents the necessary conditions.
  • An example:
    1. Let V={1,2}.
    2. Let [T,F] be [{{1,2}},{{-1,-2}}], saying that <1->1, 2->1> -> true, and <1->0, 2->0> -> false.
    3. The (here unique) triple-representation is [{1,2}, {{1,2}}, {{1,2}}].
Analysing the extension-landscape
  • The goal is to represent a partial boolean function using as few as many case-distinctions (clauses), either as DNF or as CNF.
  • The reason for this minimisation is that the cases represent a fundamentally different domain, which needs a handling on its own, and thus for economic principles we would like to have as few cases as possible.
  • Given a partial boolean function P = [V,T,F] in triple representation, we do not make any assumptions on the possible extensions.
  • The immediate goal is to find shortest CNF or CNF G for P.
  • These CNF/DNF are naturally (total) boolean functions over V, however the goal is not to find "good" or "reasonable" extensions, but to explore the landscape.
  • This could mean the following (now considering only DNF or only CNF representations, and assuming that we want to have K cases):
    1. Find all G (as above) of length K.
    2. The total assignments which satisfy/falsify all G are the sufficient/necessary assumptions we have to make in order to achieve (only) K cases.
    3. The total assignments which satisfy/falsify no G are the sufficient/necessary assumptions we have to avoid in order to achieve (only) K cases.
    4. These two assumption-types seem most natural. The remaining assumptions, satisfying/falsifying some G, are all possible for a K-representation; they are not independent of each other, and one could create a directed graph with the dependency relations.
    5. In the first two cases one gets sets of total assignments, representing boolean functions, and again one can seek for short representations of these boolean functions (here now we don't have the problem of extension, but all we need is a usable representation of these boolean functions).
  • The first step is to consider multi-valued tables.
  • Later we also need to consider "fuzzy sets"; see ComputerAlgebra/FuzzySets/Lisp/plans/general.hpp.
  • Especially for the relational point of view we can make good distinctions:
    1. At the first level we have relations over {0,1}^m (the boolean level).
    2. Then we have relations over D^m, where D is some finite set (the multivalued level).
    3. Then we have relations over [0,1]^m (where [0,1] is the interval from 0 to 1) (the fuzzy level).
    4. More generally we could allow relations over S^m, where S is some (structured) set allowing for "boolean operations".
    5. Compare "More general notions" in ComputerAlgebra/FuzzySets/Lisp/plans/general.hpp.
    6. The general axiomatic framework needs to make sure that the basic operations like representations via CNF/DNF clause-sets are possible (so that minimisation of representation size can be considered, and this in a way similar to the boolean case).
    7. In all these cases, every column stands for some "variable" (or "condition"), every row for some (observed) "case" (or "configuration"), and every entry for the "degree" to which in this specific case the "condition" is attained.
    8. The notion of a relation is just the standard one (the observation has been made, thus is true, or it is deemed as impossible, thus is false).
    9. Finally, one could also allow generalised relations, which map from S^m to some set of truth values. Then the "degree" to which the observation is "in" or "out" allows also variations.

Definition in file general.hpp.