Under the assumption that the distribution of the observed variables
is faithful to a DAG, this function estimates the Markov equivalence class of
the DAG. We do not estimate the DAG itself, because this is typically
impossible (even with an infinite amount of data), since different
DAGs can describe the same conditional independence relationships.
Since all DAGs in an equivalence class describe the same conditional
independence relationships, they are equally valid ways to
describe the conditional dependence structure that was given as
input.
All DAGs in a Markov equivalence class have the same skeleton (i.e.,
the same adjacency information) and the same v-structures (see
definition below). However, the direction of some edges may be
undetermined, in the sense that they point one way in one DAG in the
equivalence class, while they point the other way in another DAG in
the equivalence class.
A Markov equivalence class can be uniquely represented by a completed
partially directed acyclic graph (CPDAG). A CPDAG
contains undirected and directed edges. The edges have the following
interpretation: (i) there is a (directed or undirected) edge between i
and j if and only if variables i and j are conditionally dependent
given S for all possible subsets S of the remaining nodes; (ii) a directed
edge \(i \longrightarrow j\) means that this directed edge is
present in all DAGs in the Markov equivalence class; (iii) an undirected
edge \(i - j\) means that there is at least one DAG in the Markov
equivalence class with edge \(i \longrightarrow j\) and
there is at least one DAG in the Markov equivalence class with edge
\(i \longleftarrow j\).
The CPDAG is estimated using the PC algorithm (named after its inventors
Peter Spirtes and Clark Glymour). The skeleton is
estimated by the function skeleton
which uses a modified
version of the original PC algorithm (see Colombo and Maathuis (2014) for
details). The original PC algorithm is known to be
order-dependent, in the sense that the output depends on the order in
which the variables are given. Therefore, Colombo and Maathuis (2014)
proposed a simple modification, called PC-stable, that yields
order-independent adjacencies in the skeleton (see the help file
of this function for details). Subsequently, as many edges as possible
are oriented. This is done in two steps. It is important to note that
if no further actions are taken (see below) these two steps still
remain order-dependent.
The edges are oriented as follows. First, the algorithm considers all
triples (a,b,c)
, where \(a\) and \(b\) are adjacent, \(b\) and
\(c\) are adjacent, but \(a\) and \(c\) are not adjacent. For all such
triples, we direct both edges towards \(b\)
(\(a \longrightarrow b \longleftarrow c\)) if and only if
\(b\) was not part of the conditioning set that made the edge between
\(a\) and \(c\) drop out. These conditioning sets were saved in
sepset
. The structure
\(a \longrightarrow b \longleftarrow c\) is called a
v-structure.
After determining all v-structures, there may still
be undirected edges. It may be possible to direct some of these edges, since
one can deduce that one of the two possible directions of the edge is
invalid because it introduces
a new v-structure or a directed cycle. Such edges are found by
repeatedly applying rules R1-R3 of the PC algorithm as given in
Algorithm 2 of Kalisch and Bühlmann (2007). The algorithm stops if
none of the rules is applicable to the graph.
The conservative PC algorithm (conservative = TRUE
) is a
slight variation of the PC algorithm (see Ramsey et al. 2006). After
the skeleton is computed, all potential v-structures \(a - b - c\) are
checked in the following way. We test whether a and c are independent
conditioning on all subsets of the neighbors of \(a\) and all subsets of
the neighbors of \(c\). When a subset makes \(a\) and \(c\)
conditionally independent, we call it a separating set. If \(b\) is in no
such separating set or in all such separating sets, no further action is
taken and the usual PC is continued. If, however, \(b\) is in only some
separating sets, the triple \(a - b - c\) is marked as 'ambiguous'.
Moreover, if no separating set is found among the neighbors, the triple is
also marked as 'ambiguous'. An ambiguous triple is not oriented as a
v-structure. Furthermore, no further orientation rule that needs to
know whether \(a - b - c\) is a v-structure or not is applied. Instead of
using the conservative version, which is quite strict towards the
v-structures, Colombo and Maathuis (2014) introduced a less strict
version for the v-structures called majority rule. This adaptation can
be called using maj.rule = TRUE
. In this case, the triple
\(a - b - c\) is marked as 'ambiguous' if and only if \(b\) is in
exactly 50 percent of such separating sets or no separating set was found.
If \(b\) is in less than 50 percent of the separating sets it is set as a
v-structure, and if in more than 50 percent it is set as a non v-structure
(for more details see Colombo and Maathuis, 2014). The usage of both the
conservative and the majority rule versions resolve the
order-dependence issues of the determination of the v-structures.
Sampling errors (or hidden variables) can lead to conflicting
information about edge directions. For example, one may find that
\(a - b - c\) and \(b - c - d\) should both be directed as v-structures.
This gives conflicting information about the edge \(b - c\), since it should
be directed as \(b \longleftarrow c\) in v-structure
\(a \longrightarrow b \longleftarrow c\), while it should be
directed as \(b \longrightarrow c\) in v-structure
\(b \longrightarrow c \longleftarrow d\). With the option
solve.confl = FALSE
, in such cases, we simply overwrite the
directions of the conflicting edge. In the example above this means
that we obtain
\(a \longrightarrow b \longrightarrow c \longleftarrow d\)
if \(a - b - c\) was visited first, and
\(a \longrightarrow b \longleftarrow c \longleftarrow d\)
if \(b - c - d\) was visited first, meaning that the final orientation on
the edge depends on the ordering in which the v-structures were
considered. With the option solve.confl = TRUE
(which is only
supported with option u2pd = "relaxed"
), we first generate a list
of all (unambiguous) v-structures (in the example above \(a - b - c\) and
\(b - c - d\)), and then we simply orient them allowing both directions on
the edge \(b - c\), namely we allow the bi-directed edge
\(b \leftrightarrow c\) resolving the order-dependence issues on the
edge orientations. We denote bi-directed edges in the adjacency matrix
\(M\) of the graph as M[b,c] = 2
and M[c,b] = 2
. In a similar
way, using lists for the candidate edges for each orientation rule and
allowing bi-directed edges, the order-dependence issues in the orientation
rules can be resolved. Note that bi-directed edges merely represent a
conflicting orientation and they should not to be interpreted causally. The
useage of these lists for the candidate edges and allowing bi-directed edges
resolves the order-dependence issues on the orientation of the v-structures
and on the orientation rules, see Colombo and Maathuis (2014) for
more details.
Note that calling (conservative = TRUE
), or maj.rule =
TRUE
, together with solve.confl = TRUE
produces a fully
order-independent output, see Colombo and Maathuis (2014).
Sampling errors, non faithfulness, or hidden variables can also lead
to non-extendable CPDAGs, meaning that there does not exist a DAG that
has the same skeleton and v-structures as the graph found by the
algorithm. An example of this is an undirected cycle consisting of the
edges \(a - b - c - d\) and \(d - a\). In this case it is impossible to
direct the edges without creating a cycle or a new v-structure. The option
u2pd
specifies what should be done in such a situation. If the
option is set to "relaxed"
, the algorithm simply outputs the
invalid CPDAG. If the option is set to "rand"
, all direction
information is discarded and a random DAG is generated on the
skeleton, which is then converted into its CPDAG. If the option is set
to "retry"
, up to 100 combinations of possible directions of
the ambiguous edges are tried, and the first combination that results
in an extendable CPDAG is chosen. If no valid combination is found, an
arbitrary DAG is generated on the skeleton as in the option "rand",
and then converted into its CPDAG. Note that the output can also be an
invalid CPDAG, in the sense that it cannot arise from the oracle PC
algorithm, but be extendible to a DAG, for example
\(a \longrightarrow b \longleftarrow c \longleftarrow d\).
In this case, u2pd
is not used.
Using the function isValidGraph
one can check if the final output is indeed a valid CPDAG.
Notes: (1) Throughout, the algorithm works with the column positions
of the variables in the adjacency matrix, and not with the names of
the variables. (2) When plotting the object, undirected and bidirected
edges are equivalent.