Estimates the connectivity matrix of a directed causal graph, using various possible methods. Supported methods at the moment are ARGES, backShift, bivariateANM, bivariateCAM, CAM, FCI, FCI+, GES, GIES, hiddenICP, ICP, LINGAM, MMHC, rankARGES, rankFci, rankGES, rankGIES, rankPC, regression, RFCI and PC. Uses stability selection to select an appropriate sparseness.
getParentsStable(
X,
environment,
interventions = NULL,
EV = 1,
nodewise = TRUE,
threshold = 0.75,
nsim = 100,
sampleSettings = 1/sqrt(2),
sampleObservations = 1/sqrt(2),
parentsOf = 1:ncol(X),
method = c("ICP", "hiddenICP", "backShift", "pc", "LINGAM", "ges", "gies", "CAM",
"fci", "rfci", "regression", "bivariateANM", "bivariateCAM")[1],
alpha = 0.1,
mode = c("raw", "parental", "ancestral")[1],
variableSelMat = NULL,
excludeTargetInterventions = TRUE,
onlyObservationalData = FALSE,
indexObservationalData = NULL,
setOptions = list(),
verbose = FALSE
)
A (nxp)-data matrix with n observations of p variables.
A vector of length n, where the entry for
observation i is an index for the environment in which observation i took
place (simplest case entries 1
for observational data and entries
2
for interventional data of unspecified type). Is required for
methods ICP
, hiddenICP
, backShift
.
A optional list of length n. The entry for observation
i is a numeric vector that specifies the variables on which interventions
happened for observation i (a scalar if an intervention happened on just
one variable and numeric(0)
if no intervention occured for this
observation). Is used for method gies
but will generate the vector
environment
if this is set to NULL
(even though it might
generate too many different environments for some data so a hand-picked
vector environment
is preferable). Is also used for ICP
and
hiddenICP
to exclude interventions on the target variable of
interest.
A bound on the expected number of falsely selected edges.
If FALSE
, stability selection retains for each
subsample the largest overall entries in the connectivity matrix.
If TRUE
, values are ordered row- and node-wise first and then the
largest entries in each row and column are retained. Error control is
valid (under exchangeability assumption) in both cases. The latter setting
TRUE
is perhaps more robust and is the default.
The empirical selection frequency in (0.5,1) under subsampling that needs to be surpassed for an edge to be selected.
The number of resamples for stability selection.
The fraction of different environments to resample in each resampling (at least two different environments will be selected so the argument is without effect if there are just two different environments in total).
The fraction of samples to resample in each environment.
The variables for which we would like to estimate the parents. Default are all variables.
A string that specfies the method to use. The methods
pc
(PC-algorithm), LINGAM
(LINGAM), arges
(Adaptively
restricted greedy equivalence search), ges
(Greedy equivalence search), gies
(Greedy interventional equivalence
search), fci
(Fast causal inference)
and rfci
(Really fast causal inference) are imported from the
package "pcalg" and are documented there in more detail, including the
additional options that can be supplied via setOptions
. The method
CAM
(Causal additive models) is documented in the package "CAM" and
the methods ICP
(Invariant causal prediction), hiddenICP
(Invariant causal prediction with hidden variables) are from the package
"InvariantCausalPrediction". The method backShift
comes from the
package "backShift". The method mmhc
comes from the
package "bnlearn".
Finally, the methods bivariateANM
and
bivariateCAM
are for now implemented internally but will hopefully
be part of another package at some point in the near future.
The level at which tests are done. This leads to confidence
intervals for ICP
and hiddenICP
and is used internally for
pc
and rfci
.
Output type - can be "raw", "parental" or "ancestral". If "raw" output is the output of the underlying method, without modifications. If "parental" output described parental relations; if "ancestral" output is casted to ancestral relations. #TODO explain further
An optional logical matrix of dimension (pxp). An
entry TRUE
for entry (i,j) says that variable i should be considered
as a potential parent for variable j and vice versa for FALSE
. If the
default value of NULL
is used, all variables will be considered, but
this can be very slow, especially for methods pc
, ges
,
gies
, rfci
and CAM
.
When looking for parents of variable k
in 1,...,p, set to TRUE
if observations where an intervention on
variable k occured should be excluded. Default is TRUE
.
If set to TRUE
, only observational data
is used. It will take the index in environment
specified by
indexObservationalData
. If environment
is NULL
, all
observations are used. Default is FALSE
.
Index in environment
that encodes
observational data. Default is 1
.
A list that can take method-specific options; see the individual documentations of the methods for more options and their possible values.
If TRUE
, detailed output is provided.
A sparse matrix, where a 0 entry in (j,k) corresponds to an estimate
of 'no edge' j
-> parentsOf[k]
. Entries between 0 and 100
give the selection percentage of this edge over all resamples (set to 0 if
below critical threshold) and all non-zero values are considered as selected
edges.
Stability selection (2010): N. Meinshausen and P. Buhlmann, Journal of the Royal Statistical Society: Series B, 72, 417-473
getParents
for the underlying point-estimate of
the causal graph.