cna: Perform Coincidence Analysis

Description

The cna function performs Coincidence Analysis to identify atomic solution formulas (asf) consisting of minimally necessary disjunctions of minimally sufficient conditions of all outcomes in the data and combines the recovered asf to complex solution formulas (csf) representing multi-outcome structures, e.g. common-cause and/or causal-chain structures.

Usage

cna(x, type, ordering = NULL, strict = FALSE, con = 1, cov = 1, con.msc = con,
    notcols = NULL, rm.const.factors = TRUE, rm.dup.factors = TRUE,  
    maxstep = c(3, 3, 9), inus.only = FALSE, only.minimal.msc = TRUE,  
    only.minimal.asf = TRUE, maxSol = 1e6, suff.only = FALSE, 
    what = if (suff.only) "m" else "ac", cutoff = 0.5, 
    border = c("down", "up", "drop"), details = FALSE)
cscna(...)
mvcna(...)
fscna(...)
# S3 method for cna
print(x, what = x$what, digits = 3, nsolutions = 5,
      details = x$details, show.cases = NULL, ...)

Arguments

Data frame or truthTab (as output by truthTab).

type

Character vector specifying the type of x: "cs" (crisp-set), "mv" (multi-value), or "fs" (fuzzy-set).

ordering

List of character vectors specifying the causal ordering of the factors in x.

strict

Logical; if TRUE, factors on the same level of the causal ordering are not potential causes of each other; if FALSE, factors on the same level are potential causes of each other.

con

Numeric scalar between 0 and 1 to set the minimum consistency threshold every minimally sufficient condition (msc), atomic solution formula (asf), and complex solution formula (csf) must satisfy. (See also the argument con.msc below).

cov

Numeric scalar between 0 and 1 to set the minimum coverage threshold every asf and csf must satisfy.

con.msc

Numeric scalar between 0 and 1 to set the minimum consistency threshold every msc must satisfy. Allows for imposing a consistency threshold on msc that differs from the value con imposes on asf and csf. Defaults to con.

maxstep

Vector of three integers; the first specifies the maximum number of conjuncts in each disjunct of an asf, the second specifies the maximum number of disjuncts in an asf, the third specifies the maximum complexity of an asf. The complexity of an asf is the total number of exogenous factors in the asf.

inus.only

Logical; if TRUE, only disjunctive normal forms that are free of logical redundancies are retained as asf (see also is.inus). Defaults to FALSE.

only.minimal.msc

Logical; if TRUE (the default), only minimal conjunctions are retained as msc. If FALSE, sufficient conjunctions are not required to be minimal, in which case the number of msc will usually be much greater.

only.minimal.asf

Logical; if TRUE (the default), only minimal disjunctions are retained as asf. If FALSE, necessary disjunctions are not required to be minimal, in which case the number of asf will usually be much greater.

maxSol

Maximum number of asf calculated.

suff.only

Logical; if TRUE, the function only searches for msc and does not search for asf and csf.

notcols

Character vector of factors to be negated in x. If notcols = "all", all factors in x are negated.

rm.const.factors, rm.dup.factors

Logical; if TRUE (default), factors with constant values are removed and all but the first of a set of duplicated factors are removed. These parameters are passed to truthTab.

what

Character string specifying what to print; "t" for the truth table, "m" for msc, "a" for asf, "c" for csf, and "all" for all. Defaults to "ac" if suff.only = F, and to "m" otherwise.

cutoff

Minimum membership score required for a factor to count as instantiated in the data and to be integrated in the analysis. Value in the unit interval (0,1). The default cutoff is 0.5. Only meaningful if type="fs".

border

Character vector specifying whether factors with membership scores equal to cutoff are rounded up ("up"), rounded down ("down") or dropped from the analysis ("drop"). Only meaningful if type="fs".

details

Either TRUE/FALSE, or a character vector with possible elements "inus", "exhaustiveness", "faithfulness", "coherence", "redundant". The strings can also be abbreviated, e.g. "i" for "inus", "e" or "exh" for "exhaustiveness", etc.

digits

Number of digits to print in consistency, coverage, exhaustiveness, faithfulness, and coherence scores.

nsolutions

Maximum number of msc, asf, and csf to print. Alternatively, nsolutions = "all" will print all solutions.

show.cases

Logical; if TRUE, the truthTab's attribute “cases” is printed. See print.truthTab

…

In cscna, mvcna, fscna: any formal argument of cna except type. In print.cna: arguments passed to other print-methods.

Value

cna returns an object of class “cna”, which amounts to a list with the following components:

`call`:	the executed function call
`x`:	the processed data frame or truth table
`ordering`:	the implemented ordering
`truthTab`:	the object of class “truthTab”, as input to `cna`
`truthTab_out`:	the object of class “truthTab”, after modification according to `notcols`
`solution`:	the solution object, which itself is composed of lists exhibiting msc, asf, and csf for
	all factors in `x`
`what`:	the values given to the `what` argument

Contributors

Epple, Ruedi: development, testing Thiem, Alrik: testing

Details

The first input x of the cna function is a data frame or a truthTab. To ensure that no misinterpretations of returned asf and csf can occur, users are advised to use only upper case letters as factor (column) names. Column names may contain numbers, but the first sign in a column name must be a letter. Only ASCII signs should be used for column and row names.

cna must be told what type of data x contains, unless x is a truthTab. In the latter case, the type of x is already defined. Data that feature factors taking values 1 or 0 only are called crisp-set, in which case the type argument takes its default value "cs". If the data contain at least one factor that takes more than two values, e.g. {1,2,3}, the data count as multi-value, which is indicated by type = "mv". Data featuring at least one factor taking real values from the interval [0,1] count as fuzzy-set, which is specified by type = "fs". (Note that mixing multi-value and fuzzy-set factors in one analysis is not (currently) supported). To abbreviate the specification of the data type using the type argument, the functions cscna(x, ...), mvcna(x, ...), and fscna(x, ...) are available as shorthands for cna(x, type = "cs", ...), cna(x, type = "mv", ...), and cna(x, type = "fs", ...), respectively.

A data frame or truthTab x with a corresponding type specification is the only mandatory input of the cna function. If no causal ordering is provided (see below), all factor values in x are treated as potential outcomes; more specifically, in case of "cs" and "fs" data, cna tests for all factors whether their presence (i.e. them taking the value 1) can be modeled as an outcome, and in case of "mv" data, cna tests for all factors whether any of their possible values can be modeled as an outcome. That is done by, first, identifying all minimally sufficient conditions (msc) that meet the threshold given by con.msc (resp. con, if con.msc = con) for each factor in x. Then, cna disjunctively combines these msc to minimally necessary conditions that meet the threshold given by cov such that the whole disjunction meets the threshold given by con. The resulting expressions are the atomic solution formulas (asf) for every factor value that can be modeled as outcome. The default value for con.msc, con, and cov is 1.

[Consistency and coverage measures have originally been introduced into the QCA protocol by Ragin (2006). Informally put, consistency reproduces the degree to which the behavior of an outcome obeys a corresponding sufficiency or necessity relationship or a whole causal model, whereas coverage reproduces the degree to which a sufficiency or necessity relationship or a whole model accounts for the behavior of the corresponding outcome. For details see the cna package vignette or Ragin (2006).]

cna builds msc and asf from the bottom up. That is, in a first phase, cna checks whether single factor values A, b, C, (where "A" stands for "A=1" and "b" for "B=0") or D=3, E=2, etc. (whose membership scores, in case of "fs" data, meet cutoff in at least one case) are sufficient for an outcome (where a factor value counts as sufficient iff it meets the threshold given by con.msc). Next, conjuncts of two factor values A*b, A*C, D=3*E=2 etc. (whose membership scores, in case of "fs" data, meet cutoff in at least one case) are tested for sufficiency. Then, conjuncts of three factors, and so on. Whenever a conjunction (or a single factor value) is found to be sufficient, all supersets of that conjunction contain redundancies and are, thus, not considered for the further analysis. The result of that first phase is a set of msc for every outcome. To recover certain target structures in cases of noisy data, it may be useful to allow cna to also consider sufficient conditions for further analysis that are not minimal. This can be accomplished by setting only.minimal.msc to FALSE. A concrete example illustrating the utility of only.minimal.msc is provided in the example section below. (The ordinary user is advised not to change the default value of this argument.)

In the next phase, minimally necessary disjunctions are built for each outcome by first testing whether single msc are necessary, then disjunctions of two msc, then of three, etc. (where a disjunction of msc counts as necessary iff it meets the threshold given by cov). Whenever a disjunction of msc (or a single msc) is found to be necessary, all supersets of that disjunction contain redundancies and are, thus, excluded from the further analysis. Finally, all and only those disjunctions of msc that meet both cov and con are issued as redundancy-free asf. To recover certain target structures in cases of noisy data, it may be useful to allow cna to also consider necessary conditions for further analysis that are not minimal. This can be accomplished by setting only.minimal.asf to FALSE, in which case all disjunctions of msc reaching the con and cov thresholds will be returned. (The ordinary user is advised not to change the default value of this argument.)

As the combinatorial search space for asf is potentially too large to be exhaustively scanned in reasonable time, the argument maxstep allows for setting an upper bound for the complexity of the generated asf. maxstep takes a vector of three integers c(i,j,k) as input, entailing that the generated asf have maximally j disjuncts with maximally i conjuncts each and a total of maximally k factor values (k is the maximal complexity). The default is maxstep = c(3,3,9).

Note that the default con and cov thresholds of 1 will often not yield any asf because real-life data tend to feature noise due to uncontrolled background influences. In such cases, users should gradually lower con and cov (e.g. in steps of 0.05) until cna finds solution formulas. con and cov should only be lowered below 0.75 with great caution. If thresholds of 0.75 do not result in solutions, the corresponding data feature such a high degree of noise that there is a severe risk of causal fallacies.

If cna finds asf, it combines them to complex solution formulas (csf). Asf with identical outcomes are not combined, for they do not represent a complex causal structure but model ambiguities with respect to one outcome. Asf with different outcomes can be concatenated to csf using two different signs: "*" and ",". If asf1 and asf2 have at least one factor in common, they are combined to "asf1 * asf2"; if they have no common factor, they are combined to "asf1, asf2". That is, csf with "*" as main operator represent cohering complex causal structures and the degree of coherence in the analyzed data is issued as coherence score (cf. coherence). Csf with "," as main operator represent non-cohering structures. For instance, the two asf (D + U <-> L) and (G + L <-> E) can be combined to the cohering csf "(D + U <-> L) * (G + L <-> E)", which represents a causal chain from D + U via L to E, whereas (D + U <-> L) and (G + F <-> E) yield the non-cohering csf "(D + U <-> L), (G + F <-> E)".

The default output of cna lists asf and csf with consistency, coverage, and complexity scores. But cna can calculate a number of further solution attributes: inus, exhaustiveness, faithfulness, coherence, and redundant, all of which are recovered by setting details to its non-default value TRUE. These attributes require explication (see also the package vignette).

complexity: Complexity corresponds to the number of exogenous factors in a solution. inus: The theory of causation underlying cna is called INUS-theory (Mackie 1974, ch. 3; Baumgartner 2008). Very roughly, it says that X is causally relevant to Y iff X is contained in a minimally necessary disjunction of minimally sufficient conditions of Y. It was originally designed for noise-free data that can be modeled with con = cov = 1. It turns out, however, that at consistency and coverage scores below 1 expressions can count as minimally necessary disjunctions of minimally sufficient conditions that, according to classical Boolean logic, could not possibly count as such at con = cov = 1. inus thus indicates whether or not a solution counts as an INUS solution relative to the strict criteria imposed by the INUS-theory for the case of con = cov = 1. If the user is only interested in INUS solutions, the argument inus.only is available; if inus.only = TRUE, only INUS solutions are built. The function behind the inus.only argument is also available as standalone function is.inus.

Exhaustiveness and faithfulness are two measures of model fit that quantify the degree of correspondence between the configurations that are, in principle, compatible with a solution and the configurations contained in the data from which that solution is derived. Roughly, exhaustiveness is high when all or most configurations compatible with a solution are in the data, whereas faithfulness is high when no or only few configurations that are incompatible with a solution are in the data. More specifically, exhaustiveness amounts to the ratio of the number of configurations in the data that are compatible with a solution to the number of configurations in total that are compatible with a solution. faithfulness amounts to the ratio of the number of configurations in the data that are compatible with a solution to the total number of configurations in the data. High exhaustiveness and faithfulness means that the configurations in the data are all and only the configurations that are compatible with the solution. Low exhaustiveness and/or faithfulness means that the data do not contain all configurations compatible with the solution and/or the data contain many configurations not compatible with the solution. In general, solutions with higher exhaustiveness and faithfulness scores are preferable over solutions with lower scores because they are better supported by the evidence in the data.

For details on coherence scores see coherence. Finally, redundant, which is only attributed to csf, determines whether a csf contains structurally redundant proper parts. That is the case if the csf has a proper part that is logically equivalent with the whole csf (cf. Baumgartner and Falk 2018). A csf with redundant = TRUE should not be causally interpreted. Rather, it must be further processed by minimalizeCsf, which eliminates redundancies from csf. The function identifying structural redundancies is also available as standalone function redundant.

cna does not need to be told which factor(s) are endogenous, it can infer that from the data. Still, when prior causal knowledge about an investigated process is available, cna can be prohibited from treating certain factors as potential causes of other factors by means of the argument ordering. If specified, that argument defines a causal ordering for the factors in x. For example, ordering = list(c("A", "B"), "C") determines that C is causally located after A and B, meaning that C is not a potential cause of A and B. In consequence, cna only checks whether values of A and B can be modeled as causes of values of C; the test for a causal dependency in the other direction is skipped. If the argument ordering is not specified or if it is given the NULL value (which is the argument's default value), cna searches for dependencies between all factors in x. An ordering does not need to explicitly mention all factors in an analyzed data frame. If only a subset of the factors are included in the ordering, the non-included factors are entailed to be causally before the included ones. Hence, ordering = list("C"), for instance, means that C is causally located after all other factors in the data, meaning that C is the ultimate outcome of the structure under scrutiny.

The argument strict determines whether the elements of one level in an ordering can be causally related or not. For example, if ordering = list(c("A", "B"), "C") and strict = TRUE, then A and B---which are on the same level of the ordering---are excluded to be causally related and cna skips corresponding tests. By contrast, if ordering = list(c("A", "B"), "C") and strict = FALSE, then cna also searches for dependencies among A and B. The default is strict = FALSE. If the user knows prior to the analysis that the data contain exactly one endogenous factor E and that the remaining exogenous factors are mutually causally independent, the appropriate function call should feature cna(..., ordering = list("E"), strict = TRUE,...).

The argument notcols is used to calculate asf and csf for negative outcomes in data of type "cs" and "fs" (in "mv" data notcols has no meaningful interpretation and, correspondingly, issues an error message). If notcols = "all", all factors in x are negated, i.e. their membership scores i are replaced by 1-i. If notcols is given a character vector of factors in x, only the factors in that vector are negated. For example, notcols = c("A", "B") determines that only factors A and B are negated. The default is no negations, i.e. notcols = NULL.

suff.only is applicable whenever a complete cna analysis cannot be performed for reasons of computational complexity. In such a case, suff.only = TRUE forces cna to stop the analysis after the identification of msc, which will normally yield results even in cases when a complete analysis does not terminate. In that manner, it is possible to shed at least some light on the dependencies among the factors in x, in spite of an incomputable solution space.

rm.const.factors and rm.dup.factors are used to determine the handling of constant factors, i.e. factors with constant values in all cases (rows) in x, and of duplicated factors, i.e. factors that take identical value distributions in all cases in x. If rm.const.factors = TRUE, which is the default value, constant factors are removed from the data prior to the analysis, and if rm.dup.factors = TRUE (the default) all but the first of a set of duplicated factors are removed. From the perspective of configurational causal modeling, factors with constant values in all cases can neither be modeled as causes nor as outcomes; therefore, they can be removed prior to the analysis. Factors that take identical values in all cases cannot be distinguished configurationally, meaning they are one and the same factor as far as configurational causal modeling is concerned. Therefore, only one factor of a set of duplicated factors is standardly retained by cna.

The argument what can be specified both for the cna and the print function. It regulates what items of the output of cna are printed. If what is given the value “t”, the truth table is printed; if it is given an “m”, the msc are printed; if it is given an “a”, the asf are printed; if it is given a “c”, the csf are printed. what = "all" or what = "tmac" determine that all output items are printed. Note that what has no effect on the computations that will be performed when executing cna; it only determines how the result will be printed. The default output of cna is what = "ac". It first returns the implemented ordering. Second, the asf and, third, the csf are reported. If csf are the same as asf, this is indicated by "Same as asf". In case of suff.only = TRUE, what defaults to "m".

cna only includes factor configurations in the analysis that are actually instantiated in the data. The argument cutoff determines the minimum membership score required for a factor or a combination of factors to count as instantiated. It takes values in the unit interval [0,1] with a default of 0.5. border specifies whether factor combinations with membership scores equal to cutoff are rounded up (border = "up"), rounded down (border = "down"), which is the default, or dropped from the analysis (border = "drop").

The arguments digits, nsolutions, and show.cases apply to the print function, which takes an object of class “cna” as first input. digits determines how many digits of consistency, coverage, coherence, exhaustiveness, and faithfulness scores are printed, while nsolutions fixes the number of conditions and solutions to print. nsolutions applies separately to minimally sufficient conditions, atomic solution formulas, and complex solution formulas. nsolutions = "all" recovers all minimally sufficient conditions, atomic and complex solution formulas. show.cases is applicable if the what argument is given the value “t”. In that case, show.cases = TRUE yields a truth table featuring a “cases” column, which assigns cases to configurations.

The option “spaces” controls how the conditions are rendered. The current setting is queried by typing getOption("spaces"). The option specifies characters that will be printed with a space before and after them. The default is c("<->","->","+"). A more compact output is obtained with option(spaces = NULL).

References

Basurto, Xavier. 2013. “Linking Multi-Level Governance to Local Common-Pool Resource Theory using Fuzzy-Set Qualitative Comparative Analysis: Insights from Twenty Years of Biodiversity Conservation in Costa Rica.” Global Environmental Change 23(3):573-87.

Baumgartner, Michael. 2008. “Regularity Theories Reassessed.” Philosophia 36:327-354.

Baumgartner, Michael. 2009a. “Inferring Causal Complexity.” Sociological Methods & Research 38(1):71-101.

Baumgartner, Michael. 2009b. “Uncovering Deterministic Causal Structures: A Boolean Approach.” Synthese 170(1):71-96.

Baumgartner, Michael and Christoph Falk. 2018. “Boolean Difference-Making: A Modern Regularity Theory of Causation”. PhilSci Archive. url: http://philsciarchive.pitt.edu/id/eprint/14876.

Hartmann, Christof, and Joerg Kemmerzell. 2010. “Understanding Variations in Party Bans in Africa.” Democratization 17(4):642-65. DOI: 10.1080/13510347.2010.491189.

Krook, Mona Lena. 2010. “Women's Representation in Parliament: A Qualitative Comparative Analysis.” Political Studies 58(5):886-908.

Mackie, John L. 1974. The Cement of the Universe: A Study of Causation. Oxford: Oxford University Press.

Ragin, Charles C. 2006. “Set Relations in Social Research: Evaluating Their Consistency and Coverage”. Political Analysis 14(3):291-310.

Wollebaek, Dag. 2010. “Volatility and Growth in Populations of Rural Associations.” Rural Sociology 75:144-166.

Examples

Run this code

# NOT RUN {
# Ideal crisp-set data from Baumgartner (2009a) on education levels in western democracies
#---------------------------------------------------------------------------------------
# Exhaustive CNA without constraints on the search space; print atomic and complex 
# solution formulas (default output).
cna.educate <- cna(d.educate)
cna.educate
# The two resulting complex solution formulas represent a common cause structure 
# and a causal chain, respectively. The common cause structure is graphically depicted 
# in (Note, figure (a)), the causal chain in (Note, figure (b)).

# Print only complex solution formulas.
print(cna.educate, what = "c")

# Print only atomic solution formulas.
print(cna.educate, what = "a")

# Print only minimally sufficient conditions.
print(cna.educate, what = "m")

# Print only the truth table.
print(cna.educate, what = "t")

# CNA with negations of the factors E and L.
cna(d.educate, notcols = c("E","L"))

# CNA with negations of all factors.
cna(d.educate, notcols = "all")

# Print msc, asf, and csf with all solution attributes.
cna(d.educate, what = "mac", details = TRUE)

# Add only the non-standard solution attributes "inus" and "faithfulness".
cna(d.educate, details = c("i", "f"))

# Print solutions without spaces before and after "+".
options(spaces = c("<->", "->" ))
cna(d.educate, details = c("i", "f"))

# Print solutions with spaces before and after "*".
options(spaces = c("<->", "->", "*" ))
cna(d.educate, details = c("i", "f"))

# Restore the default of the option "spaces".
options(spaces = c("<->", "->", "+"))


# Crisp-set data from Krook (2010) on representation of women in western-democratic parliaments
# -------------------------------------------------------------------------------------------
# This example shows that CNA can infer which factors are causes and which ones
# are effects from the data. Without being told which factor is the outcome, 
# CNA reproduces the original QCA of Krook (2010).
# }
# NOT RUN {
ana1 <- cna(d.women, maxstep = c(3, 4, 9), details = c("e", "f"))
ana1
# }
# NOT RUN {
# The two resulting asf only reach an exhaustiveness score of 0.438, meaning that
# not all configurations that are compatible with the asf are contained in the data
# "d.women". Here is how to extract the configurations that are compatible with 
# the first asf but are not contained in "d.women":
# }
# NOT RUN {
library(dplyr)
setdiff(tt2df(selectCases(asf(ana1)$condition[1], full.tt(d.women))),
        d.women)
# }
# NOT RUN {
# Highly ambiguous crisp-set data from Wollebaek (2010) on very high volatility of 
# grassroots associations in Norway
# --------------------------------------------------------------------------------
# csCNA with ordering from Wollebaek (2010) [Beware: due to massive ambiguities, this analysis
# will take about 20 seconds to compute.]
# }
# NOT RUN {
cna(d.volatile, ordering = list("VO2"), maxstep = c(6, 6, 16))
# }
# NOT RUN {
              
# Using suff.only, CNA can be forced to abandon the analysis after minimization of sufficient 
# conditions. [This analysis terminates quickly.]
cna(d.volatile, ordering = list("VO2"), maxstep = c(6, 6, 16), suff.only = TRUE)

# Similarly, by using the default maxstep, CNA can be forced to only search for asf and csf
# with reduced complexity. [This analysis also terminates quickly.]
cna(d.volatile, ordering = list("VO2"))


# Multi-value data from Hartmann & Kemmerzell (2010) on party bans in Africa
# ---------------------------------------------------------------------------
# mvCNA with causal ordering that corresponds to the ordering in Hartmann & Kemmerzell 
# (2010); coverage cutoff at 0.95 (consistency cutoff at 1), maxstep at (6, 6, 10).
cna.pban <- mvcna(d.pban, ordering = list(c("C","F","T","V"),"PB"), cov = .95,
                  maxstep = c(6, 6, 10), what = "all")
cna.pban

# The previous function call yields a total of 14 asf and csf, only 5 of which are 
# printed in the default output. Here is how to extract all 14 asf and csf.
asf(cna.pban)
csf(cna.pban)

# [Note that all of these 14 causal models reach considerably better consistency and 
# coverage scores than the one model Hartmann & Kemmerzell (2010) present in their paper, 
# which they generated using the TOSMANA software, version 1.3: 
# T=0 + T=1 + C=2 + T=1*V=0 + T=2*V=0 <-> PB=1
mvcond("T=0 + T=1 + C=2 + T=1*V=0 + T=2*V=0 <-> PB = 1", d.pban)

# That is, not only does TOSMANA fail to recover model ambiguities in this case, it 
# also issues a model whose fit is significantly below the models this data set would 
# warrant.] 

# Extract all minimally sufficient conditions.
msc(cna.pban)

# Alternatively, all msc, asf, and csf can be recovered by means of the nsolutions
# argument of the print function.
print(cna.pban, nsolutions = "all")

# Print the truth table with the "cases" column.
print(cna.pban, what = "t", show.cases = TRUE)

# Build solution formulas with maximally 4 disjuncts.
# }
# NOT RUN {
mvcna(d.pban, ordering = list(c("C","F","T","V"),"PB"), cov = .95, maxstep = c(4, 4, 10))

# Only print 2 digits of consistency and coverage scores.
print(cna.pban, digits = 2)

# Build all but print only two msc for each factor and two asf and csf.
print(mvcna(d.pban, ordering = list(c("C","F","T","V"),"PB"), cov = .95,
      maxstep = c(6, 6, 10), what = "all"), nsolutions = 2)

# Lowering the consistency instead of the coverage threshold yields further models with
# excellent fit scores; print only asf.
mvcna(d.pban, ordering = list(c("C","F","T","V"),"PB"), con = .93, what = "a",
      maxstep = c(6, 6, 10))

# Importing an ordering from prior causal knowledge is unnecessary for d.pban. PB  
# is the only factor in that data that could possibly be an outcome.
mvcna(d.pban, cov = .95, maxstep = c(6, 6, 10))
# }
# NOT RUN {
# Fuzzy-set data from Basurto (2013) on autonomy of biodiversity institutions in Costa Rica
# ---------------------------------------------------------------------------------------
# Basurto investigates two outcomes: emergence of local autonomy and endurance thereof. The 
# data for the first outcome is contained in rows 1-14 of d.autonomy, the data for the second
# outcome in rows 15-30. For each outcome, the author distinguishes between local ("EM",  
# "SP", "CO"),  national ("CI", "PO") and international ("RE", "CN", "DE") conditions. Here,   
# we first apply fsCNA to replicate the analysis for the local conditions of the endurance of 
# local autonomy.
dat1 <- d.autonomy[15:30, c("AU","EM","SP","CO")]
fscna(dat1, ordering = list("AU"), strict = TRUE, con = .9, cov = .9)

# The fsCNA model has significantly better consistency (and equal coverage) scores than the 
# model presented by Basurto (p. 580): SP*EM + CO <-> AU, which he generated using the 
# fs/QCA software.
fscond("SP*EM + CO <-> AU", dat1) # both EM and CO are redundant to account for AU

# If we allow for dependencies among the conditions by setting strict = FALSE, CNA reveals 
# that SP is a common cause of both AU and EM:
fscna(dat1, ordering = list("AU"), strict = FALSE, con = .9, cov = .9)

# Here is the analysis for the international conditions of autonomy endurance, which
# yields the same model presented by Basurto (plus one model Basurto does not mention):
dat2 <- d.autonomy[15:30, c("AU","RE", "CN", "DE")]
fscna(dat2, ordering = list("AU"), con = .9, con.msc = .85, cov = .85)

# But there are other models (here printed with all solution attributes)
# that fare equally well.
fscna(dat2, ordering = list("AU"), con = .85, cov = .9, details = TRUE)

# Finally, here is an analysis of the whole data set, showing that across the whole period 
# 1986-2006, the best causal model of local autonomy (AU) renders that outcome dependent
# only on local direct spending (SP):
# }
# NOT RUN {
fscna(d.autonomy, ordering = list("AU"), strict = TRUE, con = .85, cov = .9, 
                maxstep = c(5, 5, 11), details = TRUE)
# }
# NOT RUN {
# Only build INUS solutions.
# }
# NOT RUN {
asf(fscna(d.autonomy, ordering = list("AU"), strict = TRUE, con = .85, cov = .9, 
                    maxstep = c(5, 5, 11), details = TRUE, inus.only = TRUE))
# }
# NOT RUN {

# Highly ambiguous artificial data to illustrate exhaustiveness
# -------------------------------------------------------------
mycond <- "(D + C*f <-> A)*(C*d + c*D <-> B)*(B*d + D*f <-> C)*(c*B + B*f <-> E)"
dat1 <- selectCases(mycond)
# }
# NOT RUN {
ana1 <- cna(dat1, details = TRUE)
# }
# NOT RUN {
# There are almost 2M csf. This is how to build the first 360 of them:
# }
# NOT RUN {
csf360 <- csf(ana1, 360)
# }
# NOT RUN {
# Most of these csf are compatible with more configurations than are contained in 
# dat1. Only 32 of csf360 are perfectly exhaustive (i.e. all compatible 
# configurations are contained in dat1):
# }
# NOT RUN {
subset(csf360, exhaustiveness == 1)
# }
# NOT RUN {
# Eliminate structural redundancies.
# }
# NOT RUN {
minimalizeCsf(subset(csf360, exhaustiveness == 1)$condition, dat1)
# }
# NOT RUN {

# Inverse search trials to assess the correctness of cna
# ------------------------------------------------------
# 1. Ideal mv data, i.e. perfect consistencies and coverages, without data fragmentation.
# }
# NOT RUN {
# Define the target and generate data on the target.
target <- "(A=1*B=2 + A=4*B=3 <-> C=1)*(C=4*D=1 + C=2*D=4 <-> E=4)"
dat1 <- allCombs(c(4, 4, 4, 4, 4)) 
dat2 <- selectCases(target, dat1, type = "mv")
# Analyze the simulated data with cna.
test1 <- mvcna(dat2)
# Eliminate possible structural redundancies.
test1 <- minimalizeCsf(test1)
# Check whether a correctness-preserving submodel of the target is among the 
# returned solutions. 
is.submodel(test1$condition, target)

# Same test as above with data fragmentation, i.e. with non-ideal data:
# only 100 of 472 observable configurations are actually
# observed. [Repeated runs will generate different data.]
dat3 <- some(dat2, n = 100, replace = TRUE)
test2 <- mvcna(dat3)
test2 <- minimalizeCsf(test2, 50)
is.submodel(test2$condition, target)

# 2. Fs data with imperfect consistencies (con = 0.8) and coverages (cov = 0.8); 
# about 150 cases (depending on the seed). Randomly generated target asf. 
# [Repeated runs will generate different targets and data.]
target <- randomAsf(full.tt(5), compl = c(2,3))
outcome <- as.vector(sapply(cna:::extract_asf(target), cna:::rhs))
# Simulate the data with con =  cov = 0.8.
dat1 <- allCombs(c(2, 2, 2, 2, 2)) - 1
dat2 <- some(truthTab(dat1), n = 200, replace = TRUE)
dat3 <- makeFuzzy(tt2df(dat2), fuzzvalues = seq(0, 0.45, 0.01))
dat4 <- selectCases1(target, con = .8, cov = .8, type = "fs", dat3)
# Analyze the simulated data with cna.
test3 <- fscna(dat4, ordering = list(outcome), strict = TRUE, con = .8, cov = .8)
# Check whether a correctness-preserving submodel of the target is among the 
# returned solutions. 
is.submodel(asf(test3)$condition, target)

# Same test as above with data fragmentation: only 80 of about 150 possible
# cases are actually observed. [Repeated runs will generate different data.]
dat5 <- some(dat4, n = 80, replace = TRUE)
fscna(dat5, ordering = list(outcome), strict = TRUE, con = .8, cov = .8)
test4 <- fscna(dat5, ordering = list(outcome), strict = TRUE, con = .8, cov = .8)
is.submodel(asf(test4)$condition, target)
# }
# NOT RUN {
# Illustration of only.minimal.msc = FALSE
# ----------------------------------------
# Simulate noisy data on the causal structure "a*B*d + A*c*D <-> E"
set.seed(1324557857)
mydata <- allCombs(rep(2, 5)) - 1
dat <- makeFuzzy(mydata, fuzzvalues = seq(0, 0.5, 0.01))
dat <- tt2df(selectCases1("a*B*d + A*c*D <-> E", con = .8, cov = .8, dat))

# In dat, "a*B*d + A*c*D <-> E" has the following con and cov scores:
as.condTbl(fscond("a*B*d + A*c*D <-> E", dat))

# The standard algorithm of cna will, however, not find this structure with
# con = cov = 0.8 because one of the disjuncts (a*B*d) does not meet the con
# threshold:
as.condTbl(fscond(c("a*B*d <-> E", "A*c*D <-> E"), dat))
fscna(dat, ordering=list("E"), strict = TRUE, con = .8, cov = .8)

# With the argument con.msc we can lower the con threshold for msc, but this does not
# recover "a*B*d + A*c*D <-> E" either:
cna2 <- fscna(dat, ordering=list("E"), strict = TRUE, con = .8, cov = .8, con.msc = .7)
cna2
msc(cna2)

# The reason is that "a*B -> E" and "c*D -> E" now also meet the con.msc threshold and,
# therefore, neither "a*B*d -> E" nor "A*c*D -> E" are contained in the msc---
# because of violated minimality. In a situation like this, lifting the minimality  
# requirement via only.minimal.msc = FALSE allows cna to find the intended target:
fscna(dat, ordering=list("E"), strict=TRUE, con = .8, cov = .8, con.msc = .7,
      only.minimal.msc = FALSE)
# }

Run the code above in your browser using DataLab