condition: Uncover relevant properties of msc, asf, and csf in a data frame or `truthTab`

Description

The condition function provides assistance to inspect the properties of msc, asf, and csf (as returned by cna) in a data frame or truthTab, but also of any other Boolean function. condition reveals which configurations and cases instantiate a given msc, asf, or csf and lists consistency and coverage scores.

Usage

condition(x, ...)
# S3 method for default
condition(x, tt, type, add.data = FALSE,
          force.bool = FALSE, rm.parentheses = FALSE, ...)
# S3 method for condTbl
condition(x, tt, ...)
cscond(...)
mvcond(...)
fscond(...)
# S3 method for condList
print(x, ...)
# S3 method for condList
summary(object, ...)
# S3 method for cond
print(x, digits = 3, print.table = TRUE, 
      show.cases = NULL, add.data = NULL, ...)
group.by.outcome(condlst, cases = TRUE)

Arguments

Character vector specifying a Boolean expression as "A + B*C -> D", where "A","B","C","D" are column names in tt.

Data frame or truthTab (see truthTab).

type

Character vector specifying the type of tt: "cs" (crisp-set), "mv" (multi-value), or "fs" (fuzzy-set). Defaults to the type of tt, if tt is a truthTab or to "cs" otherwise.

add.data

Logical; if TRUE, tt is attached to the output. Alternatively, the tt can be specified as the add.data argument in print.cond.

force.bool

Logical; if TRUE, x is interpreted as a mere Boolean function, not as a causal model.

rm.parentheses

Logical; if TRUE, parantheses around x are removed prior to evaluation.

digits

Number of digits to print in consistency and coverage scores.

print.table

Logical; if TRUE, the table assigning configurations and cases to conditions is printed.

show.cases

In print.cond: logical; if TRUE, the attribute “cases” of the truthTab is printed. Same default behavior as in print.truthTab.

object

Object of class “condList”, as returned by condition.

condlst

List of objects, each of them of class “cond”, as returned by condition.

cases

Logical; if TRUE, the returned data frame has a column named “cases”.

…

In cscond, mvcond, fscond: any formal argument of condition except type.

Value

condition returns a list of objects, each of them corresponding to one element of the input vector x. The list has a class attribute “condList”, the list elements (i.e., the individual conditions) are of class “cond” and have a more specific class label “booleanCond”, “atomicCond” or “complexCond”, according to the condition type. The components of class “booleanCond” or “atomicCond” are amended data frames, those of class “complexCond” are lists of amended data frames.

group.by.outcome returns a list of data frames, one data frame for each factor appearing as an outcome in condlst.

<code>print</code> and <code>summary</code> methods

print.condList essentially executes print.cond successively for each list element/condition. All arguments in print.condList are thereby passed to print.cond, i.e. digits, print.table, show.cases, add.data can also be specified when printing the complete list of conditions.

The summary method for class “condList” is identical to printing with print.table = FALSE.

The option “spaces” controls how the conditions are rendered in certain contexts. The current setting is queried by typing getOption("spaces"). The option specifies characters that will be printed with a space before and after them. The default is c("<->","->","+"). A more compact output is obtained with option(spaces = NULL).

Details

Depending on the processed data frame or truthTab, the solutions output by cna are often ambiguous; that is, it can happen that many solution formulas fit the data equally well. In such cases, the data alone are insufficient to single out one solution. While cna simply lists the possible solutions, the condition function is intended to provide assistance in comparing different minimally sufficient conditions (msc), atomic solution formulas (asf), and complex solution formulas (csf) in order to have a better basis for selecting among them.

Most importantly, the output of the condition function highlights in which configurations and cases in the data an msc, asf, and csf is instantiated. Thus, if the user has independent causal knowledge about particular configurations or cases, the information received from condition may be helpful in selecting the solutions that are consistent with that knowledge. Moreover, the condition function allows for directly contrasting consistency and coverage scores or frequencies of different conditions contained in returned asf.

The condition function is independent of cna. That is, any msc, asf, or csf---irrespective of whether they are output by cna---can be given as input to condition. Even Boolean expressions that do not have the syntax of CNA solution formulas can be passed to condition.

The first required input x of condition is a character vector consisting of Boolean formulas composed of factor names that are column names of tt, which is the second required input. tt can be a truthTab or a data frame. In the latter case, condition must be told what type of data tt contains, and the data frame will be converted to a truthTab. Data that feature factors taking values 1 or 0 only are called crisp-set, in which case the type argument takes its default value "cs". If the data contain at least one factor that takes more than two values, e.g. {1,2,3}, the data count as multi-value, which is indicated by type = "mv". Data featuring at least one factor taking real values from the interval [0,1] count as fuzzy-set, which is specified by type = "fs". To abbreviate the specification of the data type, the functions cscond(x, tt, ...), mvcond(x, tt, ...), and fscond(x, tt, ...) are available as shorthands for condition(x, tt, type = "cs", ...), condition(x, tt, type = "mv", ...), and condition(x, tt, type = "fs", ...), respectively.

Conjunction can be expressed by “*” or “&”, disjunction by “+” or “|”, negation can be expressed by “-” or “!” or, in case of crisp-set or fuzzy-set data, by changing upper case into lower case letters and vice versa, implication by “->”, and equivalence by “<->”. Examples are

A*b -> C, A+b*c+!(C+D), A*B*C + -(E*!B), C -> A*B + a*b
(A=2*B=4 + A=3*B=1 <-> C=2)*(C=2*D=3 + C=1*D=4 <-> E=3)
(A=2*B=4*!(A=3*B=1)) | !(C=2|D=4)*(C=2*D=3 + C=1*D=4 <-> E=3)

Three types of conditions are distinguished:

The type boolean comprises Boolean expressions that do not have the syntactic form of causal models, meaning the corresponding character strings in the argument x do not have an “->” or “<->” as main operator. Examples: "A*B + C" or "-(A*B + -(C+d))". The expression is evaluated and written into a data frame with one column. Frequency is attached to this data frame as an attribute.
The type atomic comprises expressions that have the syntactic form of atomic causal models, i.e. asf, meaning the corresponding character strings in the argument x have an “->” or “<->” as main operator. Examples: "A*B + C -> D" or "A*B + C <-> D". The expressions on both sides of “->” and “<->” are evaluated and written into a data frame with two columns. Consistency and coverage are attached to these data frames as attributes.
The type complex represents complex causal models, i.e. csf. Example: "(A*B + a*b <-> C)*(C*d + c*D <-> E)". Each component must be a causal model of type atomic. These components are evaluated separately and the results stored in a list. Consistency and coverage of the complex expression are then attached to this list.

The types of the character strings in the input x are automatically discerned and thus do not need be specified by the user.

If force.bool = TRUE, expressions with “->” or “<->” are treated as type boolean, i.e. only their frequencies are calculated. Enclosing a character string representing a causal model in parentheses has the same effect as specifying force.bool = TRUE. rm.parentheses = TRUE removes parentheses around the expression prior to evaluation, and thus has the reverse effect of setting force.bool = TRUE.

If add.data = TRUE, tt is appended to the output such as to facilitate the analysis and evaluation of a model on the case level.

The digits argument of the print function determines how many digits of consistency and coverage scores are printed. If print.table = FALSE, the table assigning conditions to configurations and cases is omitted, i.e. only frequencies or consistency and coverage scores are returned. row.names = TRUE also lists the row names in tt. If rows in a tt are instantiated by many cases, those cases are not printed by default. They can be recovered by show.cases = TRUE.

group.by.outcome takes a condlist as input, i.e. a list of “cond” objects, as it is returned by condition, and combines the entries in that lists into a data frame with a larger number of columns. The additional attributes (consistencies etc.) are thereby removed.

References

Emmenegger, Patrick. 2011. “Job Security Regulations in Western Democracies: A Fuzzy Set Analysis.” European Journal of Political Research 50(3):336-64.

Lam, Wai Fung, and Elinor Ostrom. 2010. “Analyzing the Dynamic Complexity of Development Interventions: Lessons from an Irrigation Experiment in Nepal.” Policy Sciences 43 (2):1-25.

Ragin, Charles. 2008. Redesigning Social Inquiry: Fuzzy Sets and Beyond. Chicago, IL: University of Chicago Press.

Examples

Run this code

# NOT RUN {
# Crisp-set data from Lam and Ostrom (2010) on the impact of development interventions 
# ------------------------------------------------------------------------------------
# Build a truth table for d.irrigate.
irrigate.tt <- truthTab(d.irrigate)

# Any Boolean functions involving the factors "A", "R", "F", "L", "C", "W" in d.irrigate
# can be tested by condition.
condition("A*r + L*C", irrigate.tt)
condition(c("A*r + !(L*C)", "A*-(L | -F)", "C -> A*R + C*l"), irrigate.tt)
condition(c("A*r + L*C -> W", "!(A*L*R -> W)", "(A*R + C*l <-> F)*(W*a -> F)"),
          irrigate.tt)

# Group expressions with "->" by outcome.
irrigate.con <- condition(c("A*r + L*C -> W", "A*L*R -> W", "A*R + C*l -> F", "W*a -> F"),
                          irrigate.tt)
group.by.outcome(irrigate.con)

# Pass minimally sufficient conditions inferred by cna to condition.
irrigate.cna1 <- cna(d.irrigate, ordering = list(c("A","R","L"),c("F","C"),"W"), con = .9)
condition(msc(irrigate.cna1)$condition, irrigate.tt)

# Pass atomic solution formulas inferred by cna to condition.
irrigate.cna1 <- cna(d.irrigate, ordering = list(c("A","R","L"),c("F","C"),"W"), con = .9)
condition(asf(irrigate.cna1)$condition, irrigate.tt)

# Group by outcome.
irrigate.cna1.msc <- condition(msc(irrigate.cna1)$condition, irrigate.tt)
group.by.outcome(irrigate.cna1.msc)

irrigate.cna2 <- cna(d.irrigate, con = .9)
irrigate.cna2a.asf <- condition(asf(irrigate.cna2)$condition, irrigate.tt)
group.by.outcome(irrigate.cna2a.asf)

# Add data.
(irrigate.cna2b.asf <- condition(asf(irrigate.cna2)$condition, irrigate.tt, 
                                     add.data = TRUE))

# No spaces before and after "+".
options(spaces = c("<->", "->" ))
irrigate.cna2b.asf

# No spaces at all.
options(spaces = NULL)
irrigate.cna2b.asf

# Restore the default spacing.
options(spaces = c("<->", "->", "+"))

# Print only consistency and coverage scores.
print(irrigate.cna2a.asf, print.table = FALSE)
summary(irrigate.cna2a.asf)

# Print only 2 digits of consistency and coverage scores.
print(irrigate.cna2b.asf, digits = 2)

# Instead of a truth table as output by truthTab, it is also possible to provide a data
# frame as second input. 
condition("A*r + L*C", d.irrigate, type = "cs")
condition(c("A*r + L*C", "A*L -> F", "C -> A*R + C*l"), d.irrigate, type = "cs")
condition(c("A*r + L*C -> W", "A*L*R -> W", "A*R + C*l -> F", "W*a -> F"), d.irrigate, 
          type = "cs")
          
          
# Fuzzy-set data from Emmenegger (2011) on the causes of high job security regulations
# ------------------------------------------------------------------------------------
# Compare the CNA solutions for outcome JSR to the solution presented by Emmenegger
# S*R*v + S*L*R*P + S*C*R*P + C*L*P*v -> JSR (p. 349), which he generated by fsQCA as
# implemented in the fs/QCA software, version 2.5.
jobsecurity.cna <- fscna(d.jobsecurity, ordering=list("JSR"), strict = TRUE, con = .97, 
                         cov= .77, maxstep = c(4, 4, 15))
compare.sol <- fscond(c(asf(jobsecurity.cna)$condition, "S*R*v + S*L*R*P + S*C*R*P + 
                         C*L*P*v -> JSR"), d.jobsecurity)
summary(compare.sol)
print(compare.sol, add.data = d.jobsecurity)
group.by.outcome(compare.sol)

# There exist even more high quality solutions for JSR.
jobsecurity.cna2 <- fscna(d.jobsecurity, ordering=list("JSR"), strict = TRUE, con = .95, 
                          cov= .8, maxstep = c(4, 4, 15))
compare.sol2 <- fscond(c(asf(jobsecurity.cna2)$condition, "S*R*v + S*L*R*P + S*C*R*P + 
                         C*L*P*v -> JSR"), d.jobsecurity)
summary(compare.sol2)
group.by.outcome(compare.sol2)


# Simulate multi-value data
# -------------------------
library(dplyr)
# Define the data generating structure.
groundTruth <- "(A=2*B=1 + A=3*B=3 <-> C=1)*(C=1*D=2 + C=2*D=3 <-> E=3)"
# Generate ideal data on groundTruth.
fullData <- allCombs(c(3, 3, 2, 3, 3))
idealData <- tt2df(selectCases(groundTruth, fullData, type = "mv"))
# Randomly add 15% inconsistent cases.
inconsistentCases <- setdiff(fullData, idealData)
realData <- rbind(idealData, inconsistentCases[sample(1:nrow(inconsistentCases), 
                                               nrow(idealData)*0.15), ])
# Determine model fit of groundTruth and its submodels. 
condition(groundTruth, realData, type = "mv")
mvcond(groundTruth, realData)
mvcond("A=2*B=1 + A=3*B=3 <-> C=1", realData)
mvcond("A=2*B=1 + A=3*B=3 <-> C=1", realData, force.bool = TRUE)
mvcond("(C=1*D=2 + C=2*D=3 <-> E=3)", realData)
mvcond("(C=1*D=2 + C=2*D=3 <-> E=3)", realData, rm.parentheses = TRUE)
mvcond("(C=1*D=2 +!(C=2*D=3 + A=1*B=1) <-> E=3)", realData)
# Manually calculate unique coverages, i.e. the ratio of an outcome's instances
# covered by individual msc alone (for details on unique coverage cf.
# Ragin 2008:63-68).
summary(mvcond("A=2*B=1 * -(A=3*B=3) <-> C=1", realData)) # unique coverage of A=2*B=1
summary(mvcond("-(A=2*B=1) * A=3*B=3 <-> C=1", realData)) # unique coverage of A=3*B=3
# }

Run the code above in your browser using DataLab