randomConds: Generate random solution formulas

Description

Based on a set of factors and a corresponding data type---given as a data frame or truthTab---, randomAsf generates a random atomic solution formula (asf) and randomCsf a random (acyclic) complex solution formula (csf).

Usage

randomAsf(x, outcome = NULL, compl = NULL, how = c("inus", "minimal"))
randomCsf(x, outcome = NULL, n.asf = NULL, compl = NULL)

Arguments

Data frame or truthTab; determines the number of factors, their names and their possible values.

outcome

Optional character vector (of length 1 in randomAsf) specifying the outcome factor(s) in the solution formula; must be a subset of names(x).

compl

Integer vector specifying the maximal complexity of the formula (i.e. number of factors in msc; number of msc in asf).

how

Character string, either "inus" or "minimal", specifying whether the generated solution formula is redundancy-free relative to full.tt(x) or relative to x (see details).

n.asf

Integer scalar specifying the number of asf in the csf. Is overridden by length(outcome) if outcome is not NULL. Note that n.asf is limited to ncol(x)-2.

Value

The randomly generated formula, a character string.

Details

randomAsf and randomCsf can be used to randomly draw data generating structures (ground truths) in inverse search trials benchmarking the output of cna. In the regularity theoretic context in which the CNA method is embedded, a causal structure is a redundancy-free Boolean dependency structure. Hence, randomAsf and randomCsf both produce redundancy-free Boolean dependency structures. randomAsf generates structures with one outcome, i.e. atomic solution formulas (asf), randomCsf generates structures with multiple outcomes, i.e. complex solution formulas (csf), that are free of cyclic substructures. In a nutshell, randomAsf proceeds by, first, randomly drawing disjunctive normal forms (DNFs) and by, second, eliminating redundancies from these DNFs. randomCsf essentially consists in repeated executions of randomAsf.

The only mandatory argument of randomAsf and randomCsf is a data frame or a truthTab x defining the factors (with their possible values) from which the generated asf and csf shall be drawn. If asf and csf are built from multi-value or fuzzy-set factors, x must be a truthTab.

The optional argument outcome determines which factors in x shall be treated as outcomes. If outcome is at its default value NULL, randomAsf and randomCsf randomly draw factor(s) from x to be treated as outcome(s).

The argument compl controls the complexity of the generated asf and csf. More specifically, the initial complexity of asf and csf (i.e. the number of factors included in msc and the number of msc included in asf prior to redundancy elimination) is drawn from the vector compl. As this complexity might be reduced in the subsequent process of redundancy elimination, issued asf or csf will often have lower complexity than specified in compl. The default value of compl is determined by the number of columns in x. Assigning unduly high values to compl results in an error.

randomAsf has the additional argument how with the two possible values "inus" and "minimal". how = "inus" determines that the generated asf is redundancy-free relative to all logically possible configurations of the factors in x, i.e. relative to full.tt(x), whereas in case of how = "minimal" redundancy-freeness is imposed only relative to all configurations actually contained in x, i.e. relative to x itself. Typically "inus" should be used; the value "minimal" is relevant mainly in repeated randomAsf calls from within randomCsf. Moreover, setting how = "minimal" will return an error if x is a truthTab of type "fs".

The argument n.asf controls the number of asf in the generated csf. Its value is limited to ncol(x)-2 and overridden by length(outcome), if outcome is not NULL. Analogously to compl, n.asf specifies the number of asf prior to redundancy elimination, which, in turn, may further reduce these numbers. That is, n.asf provides an upper bound for the number of asf in the resulting csf.

Examples

Run this code

# NOT RUN {
# randomAsf
# ---------
# Asf generated from explicitly specified binary factors.
randomAsf(full.tt("H*I*T*R*K"))
randomAsf(full.tt("Johnny*Debby*Aurora*Mars*James*Sonja"))

# Asf generated from a specified number of binary factors.
randomAsf(full.tt(7))

# Asf generated from an existing data frame.
randomAsf(d.educate)

# Specify the outcome.
randomAsf(d.educate, outcome = "G")

# Specify the complexity.
randomAsf(full.tt(7), compl = 2)
randomAsf(full.tt(7), compl = 3:4)

# Redundancy-freeness relative to x instead of full.tt(x).
randomAsf(d.educate, outcome = "G", how = "minimal")

# Asf with multi-value factors (x must be given as a truthTab).
randomAsf(mvtt(allCombs(c(3,4,3,5,3,4))))

# Asf from fuzzy-set data (x must be given as a truthTab).
randomAsf(fstt(d.jobsecurity))
randomAsf(fstt(d.jobsecurity), outcome = "JSR")

# Generate 20 asf.
# }
# NOT RUN {
replicate(20, randomAsf(full.tt(7), compl = 2:3))
# }
# NOT RUN {

# randomCsf
# ---------
# Csf generated from explicitly specified binary factors.
randomCsf(full.tt("H*I*T*R*K*Q*P"))

# Csf generated from a specified number of binary factors.
randomCsf(full.tt(7))

# Specify the outcomes.
randomCsf(d.volatile, outcome = c("RB","SE"))

# Specify the complexity.
randomCsf(d.volatile, outcome = c("RB","SE"), compl = 2)
randomCsf(full.tt(7), compl = 3:4)

# Specify the number of asf.
randomCsf(full.tt(7), n.asf = 3)

# Csf with multi-value factors (x must be given as a truthTab).
randomCsf(mvtt(allCombs(c(3,4,3,5,3,4))))

# Generate 20 csf.
# }
# NOT RUN {
replicate(20, randomCsf(full.tt(7), n.asf = 2, compl = 2:3))


# Inverse searches
# ----------------
# === Ideal Data ===
# Draw the data generating structure. (Every run yields different 
# targets and data.)
target <- randomCsf(full.tt(5), n.asf = 2)
target
# Select the cases compatible with the target.
x <- selectCases(target)
# Run CNA without an ordering.
mycna <- cna(x, maxstep = c(4, 4, 12), rm.dup.factors = FALSE)
# Extract the first 100 csf (depending on the seed, there may be
# more than 100 csf).
csfs <- csf(mycna, 100)
# Eliminate possible structural redundancies from the csf.
min.csfs <- minimalizeCsf(csfs$condition, x)$condition
# Check whether the target is completely returned.
any(unlist(lapply(min.csfs, identical.model, target)))

# === Data fragmentation (20% missing observations) ===
# Draw the data generating structure. (Every run yields different 
# targets and data.)
target <- randomCsf(full.tt(7), n.asf = 2)
target
# Generate the complete data.
x <- tt2df(selectCases(target))
# Introduce fragmentation.
x <- x[-sample(1:nrow(x), nrow(x)*0.2), ] 
# Run CNA without an ordering.
mycna <- cna(x, maxstep = c(4, 4, 12), rm.dup.factors = FALSE)
# Extract and minimize the first 100 csf (depending on the seed, there may be
# more than 100 csf).
csfs <- csf(mycna, 100)
min.csfs <- minimalizeCsf(csfs$condition, x)
# Check whether (a submodel of) the target is actually returned.
any(is.submodel(min.csfs$condition, target))

# === Data fragmentation and noise (20% missing observations, noise ratio of 0.05) ===
# Multi-value data.
# Draw the data generating structure. (Every run yields different 
# targets and data.)
fullData <- mvtt(allCombs(c(4,4,4,4,4)))
target <- randomCsf(fullData, n.asf=2, compl = 2:3)
target
# Generate the complete data.
x <- tt2df(selectCases(target, fullData))
# Introduce fragmentation.
x <- x[-sample(1:nrow(x), nrow(x)*0.2), ] 
# Introduce random noise.
x <- rbind(tt2df(fullData[sample(1:nrow(fullData), nrow(x)*0.05), ]), x)  
# Run CNA without an ordering.
mycna <- mvcna(x, con = .75, cov = .75, maxstep = c(3, 3, 12), rm.dup.factors = F)
# Extract and minimize the first 100 csf (depending on the seed, there may be
# more than 100 csf).
csfs <- csf(mycna, 100)
min.csfs <- if(nrow(csfs)>0) {
              as.vector(minimalizeCsf(csfs$condition, mvtt(x))$condition)
            } else {NA} 
# Check whether no causal fallacy (no false positive) is returned.
if(length(min.csfs)==1 && is.na(min.csfs)) {
      TRUE } else {any(is.submodel(min.csfs, target))}

# }

Run the code above in your browser using DataLab