truthTab: Assemble cases with identical configurations in a truth table

Description

The truthTab function assembles cases with identical configurations from a crisp-set (cs), multi-value (mv), or fuzzy-set (fs) data frame in a table called a truth table (which is a very different type of object for CNA than for the related method of QCA).

Usage

truthTab(x, type = c("cs", "mv", "fs"), frequency = NULL,
         case.cutoff = 0, rm.dup.factors = TRUE, rm.const.factors = TRUE,
         .cases = NULL, verbose = TRUE)
cstt(...)
mvtt(...)
fstt(...)
# S3 method for truthTab
print(x, show.cases = NULL, ...)

Arguments

Data frame or matrix.

type

Character vector specifying the type of x: "cs" (crisp-set), "mv" (multi-value), or "fs" (fuzzy-set).

frequency

Numeric vector of length nrow(x). All elements must be non-negative.

case.cutoff

Minimum number of occurrences (cases) of a configuration in x. Configurations with fewer than case.cutoff occurrences (cases) are not included in the truth table.

rm.dup.factors

Logical; if TRUE, all but the first of a set of factors with identical values in x are eliminated.

rm.const.factors

Logical; if TRUE, factors with constant values in x are eliminated.

.cases

Set case labels (row names): optional character vector of length nrow(x).

verbose

Logical; if TRUE, some messages on the truth table are printed.

show.cases

Logical; if TRUE, the attribute “cases” is printed.

…

In cstt, mvtt, fstt: any formal argument of truthTab except type. In print.truthTab: arguments passed to print.data.frame.

Value

An object of type “truthTab”, i.e. a data.frame with additional attributes “type”, “n” and “cases”.

Details

The first input x of the truthTab function is a data frame. To ensure that no misinterpretations of issued asf and csf can occur, users are advised to use only upper case letters as factor (column) names. Column names may contain numbers, but the first sign in a column name must be a letter. Only ASCII signs should be used for column and row names.

The truthTab function merges multiple rows of x featuring the same configuration into one row, such that each row of the resulting table, which is called a truth table, corresponds to one determinate configuration of the factors in x. The number of occurrences (cases) and an enumeration of the cases are saved as attributes “n” and “cases”, respectively. The attribute “n” is always printed in the output of truthTab, the attribute “cases” is printed if the argument show.cases is TRUE in the print method.

The argument type specifies the type of data. "cs" stands for crisp-set data featuring factors that only take values 1 and 0; "mv" stands for multi-value data with factors that can take any non-negative integers as values; "fs" stands for fuzzy-set data comprising factors taking real values from the interval [0,1], which are interpreted as membership scores in fuzzy sets. To abbreviate the specification of the data type using the type argument, the functions cstt(x, ...), mvtt(x, ...), and fstt(x, ...) are available as shorthands for truthTab(x, type = "cs", ...), truthTab(x, type = "mv", ...), and truthTab(x, type = "fs", ...), respectively.

Instead of multiply listing identical configurations in x, the frequency argument can be used to indicate the frequency of each configuration in the data frame. frequency takes a numeric vector of length nrow(x) as value. For instance, truthTab(x, frequency = c(3,4,2,3)) determines that the first configuration in x is featured in 3 cases, the second in 4, the third in 2, and the fourth in 3 cases.

The case.cutoff argument is used to determine that configurations are only included in the truth table if they are instantiated at least as many times in x as the number assigned to case.cutoff. Or differently, configurations that are instantiated less than the number given to case.cutoff are excluded from the truth table. For instance, truthTab(x, case.cutoff = 3) entails that configurations with less than 3 cases are excluded.

rm.dup.factors and rm.const.factors allow for determining whether all but the first of a set of duplicated factors (i.e. factors with identical value distributions in x) are eliminated and whether constant factors (i.e. factors with constant values in all cases (rows) in x) are eliminated. From the perspective of configurational causal modeling, factors with constant values in all cases can neither be modeled as causes nor as outcomes; therefore, they can be removed prior to the analysis. Factors with identical value distributions cannot be distinguished configurationally, meaning they are one and the same factor as far as configurational causal modeling is concerned. Therefore, only one factor of a set of duplicated factors is standardly retained by truthTab.

.cases can be used to set case labels (row names). It is a character vector of length nrow(x).

The row.names argument of the print function determines whether the case labels of x are printed or not. By default, row.names is TRUE unless the (comma-separated) list of the cases exceeds 20 characters in one row at least.

References

Aleman, Jose. 2009. “The Politics of Tripartite Cooperation in New Democracies: A Multi-level Analysis.” International Political Science Review 30 (2):141-162.

Greckhamer, Thomas, Vilmos F. Misangyi, Heather Elms, and Rodney Lacey. 2008. “Using Qualitative Comparative Analysis in Strategic Management Research: An Examination of Combinations of Industry, Corporate, and Business-Unit Effects.” Organizational Research Methods 11 (4):695-726.

Thiem, Alrik. 2018. “QCApro: Advanced Functionality for Performing and Evaluating Qualitative Comparative Analysis.” R Package Version 1.1-2. URL: http://www.alrik-thiem.net/software/.

Examples

Run this code

# NOT RUN {
# Manual input of cs data
# -----------------------
dat1 <- data.frame(
  A = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
  B = c(1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0),
  C = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0),
  D = c(1,1,1,1,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0),
  E = c(1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,0,0,0)
)

# Default return of the truthTab function.
truthTab(dat1)

# Recovering the cases featuring each configuration by means of the print function.
print(truthTab(dat1), show.cases = TRUE)

# The same truth table as before can be generated by using the frequency argument while
# listing each configuration only once.
dat1 <- data.frame(
  A = c(1,1,1,1,1,1,0,0,0,0,0),
  B = c(1,1,1,0,0,0,1,1,1,0,0),
  C = c(1,1,1,1,1,1,1,1,1,0,0),
  D = c(1,0,0,1,0,0,1,1,0,1,0),
  E = c(1,1,0,1,1,0,1,0,1,1,0)
)
truthTab(dat1, frequency = c(4,3,1,3,4,1,10,1,3,3,3))

# Set (random) case labels.
print(truthTab(dat1, .cases = sample(letters, nrow(dat1), replace = FALSE)),
      show.cases = TRUE)

# Truth tables generated by truthTab can be input into the cna function.
dat1.tt <- truthTab(dat1, frequency = c(4,3,1,3,4,1,4,1,3,3,3))
# }
# NOT RUN {
cna(dat1.tt, con = .85, details = TRUE)
# }
# NOT RUN {
# By means of the case.cutoff argument configurations with less than 2 cases can
# be excluded (which yields perfect consistency and coverage scores for dat1).
dat1.tt <- truthTab(dat1, frequency = c(4,3,1,3,4,1,4,1,3,3,3), case.cutoff = 2)
# }
# NOT RUN {
cna(dat1.tt, details = TRUE)
# }
# NOT RUN {

# Simulating multi-value data with biased samples (exponential distribution)
# --------------------------------------------------------------------------
dat1 <- allCombs(c(3,3,3,3,3))
set.seed(32)
m <- nrow(dat1)
wei <- rexp(m)
dat2 <- dat1[sample(nrow(dat1), 100, replace = TRUE, prob = wei),]
truthTab(dat2, type = "mv") # 100 cases with 46 configurations instantiated only once.
mvtt(dat2, case.cutoff = 2) # removing the single instances.

# Duplicated factors are not eliminated, constant factors are not eliminated.
dat3 <- selectCases("(A=1+A=2+A=3 <-> C=2)*(B=3<->D=3)*(B=2<->D=2)*(A=2 + B=1 <-> E=2)",
                    dat1, type = "mv")
# }
# NOT RUN {
mvtt(dat3, rm.dup.factors = FALSE, rm.const.factors = FALSE)
# }
# NOT RUN {

# truthTab with fuzzy-set data from Aleman (2009)
# -----------------------------------------------
# Include all cases.
tt.pacts <- fstt(d.pacts) 
# }
# NOT RUN {
fscna(tt.pacts, con = .93, cov = .86, details = TRUE)
# }
# NOT RUN {
# Only include configurations with at least 3 cases.
tt.pacts2 <- fstt(d.pacts, case.cutoff = 3) 
# }
# NOT RUN {
fscna(tt.pacts2, con = .93, cov = .86, details = TRUE)
# }
# NOT RUN {

# Large-N data with crisp sets from Greckhamer et al. (2008)
#-----------------------------------------------------------
truthTab(d.performance[1:8], frequency = d.performance$frequency)

# Eliminate configurations with less than 5 cases.
truthTab(d.performance[1:8], frequency = d.performance$frequency, case.cutoff = 5)

# Various large-N CNAs of d.performance with varying case cut-offs.
# }
# NOT RUN {
cna(truthTab(d.performance[1:8], frequency = d.performance$frequency, case.cutoff = 4),
    ordering = list("SP"), con = .75, cov = .6)
cna(truthTab(d.performance[1:8], frequency = d.performance$frequency, case.cutoff = 5),
    ordering = list("SP"), con = .75, cov = .6)
cna(truthTab(d.performance[1:8], frequency = d.performance$frequency, case.cutoff = 10),
    ordering = list("SP"), con = .75, cov = .6)
print(cna(truthTab(d.performance[1:8], frequency = d.performance$frequency, case.cutoff = 15),
          ordering = list("SP"), con = .75, cov = .6, what = "a"), nsolutions = "all")
# }

Run the code above in your browser using DataLab