ges: Estimate the Markov equivalence class of a DAG using GES

Description

Estimate the observational essential graph representing the Markov equivalence class of a DAG using the greedy equivalence search (GES) algorithm of Chickering (2002).

Usage

ges(score, labels = score$getNodes(),
    fixedGaps = NULL, adaptive = c("none", "vstructures", "triples"), 
    phase = c("forward", "backward", "turning"), iterate = length(phase) > 1,
    turning = NULL, maxDegree = integer(0), verbose = FALSE, ...)

Value

ges returns a list with the following two components:

essgraph: An object of class EssGraph containing an estimate of the equivalence class of the underlying DAG.
repr: An object of a class derived from ParDAG containing a (random) representative of the estimated equivalence class.

Arguments

score

An instance of a class derived from Score which only accounts for observational data. If the dataset is high-dimensional (p>=n) ges might not be able to terminate.

labels

Node labels; by default, they are determined from the scoring object.

fixedGaps

logical symmetric matrix of dimension p*p. If entry [i, j] is TRUE, the result is guaranteed to have no edge between nodes \(i\) and \(j\).

adaptive

indicating whether constraints should be adapted to newly detected v-structures or unshielded triples (cf. details).

phase

Character vector listing the phases that should be used; possible values: forward, backward, and turning (cf. details).

iterate

Logical indicating whether the phases listed in the argument phase should be iterated more than once (iterate = TRUE) or not.

turning

Setting turning = TRUE is equivalent to setting phases = c("forward", "backward") and iterate = FALSE; the use of the argument turning is deprecated.

maxDegree

Parameter used to limit the vertex degree of the estimated graph. Valid arguments:

Vector of length 0 (default): vertex degree is not limited.
Real number \(r\), \(0 < r < 1\): degree of vertex \(v\) is limited to \(r \cdot n_v\), where \(n_v\) denotes the number of data points where \(v\) was not intervened.
Single integer: uniform bound of vertex degree for all vertices of the graph.
Integer vector of length p: vector of individual bounds for the vertex degrees.

verbose

If TRUE, detailed output is provided.

...

Additional arguments for debugging purposes and fine tuning.

Author

Alain Hauser (alain.hauser@bfh.ch)

Details

Under the assumption that the distribution of the observed variables is faithful to a DAG, this function estimates the Markov equivalence class of the DAG. It does not estimate the DAG itself, because this is typically impossible (even with an infinite amount of data): different DAGs (forming a Markov equivalence class) can describe the same conditional independence relationships and be statistically indistinguishable from observational data alone.

All DAGs in an equivalence class have the same skeleton (i.e., the same adjacency information) and the same v-structures (i.e., the same induced subgraphs of the form \(a \longrightarrow b \longleftarrow c\)). However, the direction of some edges may be undetermined, in the sense that they point one way in one DAG in the equivalence class, while they point the other way in another DAG in the equivalence class.

An equivalence class can be uniquely represented by a partially directed graph called (observational) essential graph or CPDAG (completed partially directed acyclic graph). Its edges have the following interpretation:

a directed edge \(a \longrightarrow b\) stands for an arrow that has the same orientation in all representatives of the Markov equivalence class;
an undirected edge a -- b stands for an arrow that is oriented in one way in some representatives of the equivalence class and in the other way in other representatives of the equivalence class.

Note that when plotting the object, undirected and bidirected edges are equivalent.

GES (greedy equivalence search) is a score-based algorithm that greedily maximizes a score function (typically the BIC, passed to the function via the argument score) in the space of (observational) essential graphs in three phases, starting from the empty graph:

Forward phase: In the forward phase, GES moves through the space of essential graphs in steps that correspond to the addition of a single edge in the space of DAGs; the phase is aborted as soon as the score cannot be augmented any more.
Backward phase: In the backward phase, the algorithm performs moves that correspond to the removal of a single edge in the space of DAGs until the score cannot be augmented any more.
Turning phase: In the turning phase, the algorithm performs moves that correspond to the reversal of a single arrow in the space of DAGs until the score cannot be augmented any more.

GES cycles through these three phases until no augmentation of the score is possible any more if iterate = TRUE. Note that the turning phase was not part of the original implementation of Chickering (2002), but was introduced by Hauser and Bühlmann (2012) and shown to improve the overall estimation performance. The original algorithm of Chickering (2002) is reproduced with phase = c("forward", "backward") and iterate = FALSE.

GES has the same purpose as the PC algorithm (see pc). While the PC algorithm is based on conditional independence tests (requiring the choice of an independence test and a significance level, see pc), the GES algorithm is a score-based method (requiring the choice of a score function) and does not depend on conditional independence tests. Since GES always operates in the space of essential graphs, it returns a valid essential graph (or CPDAG) in any case.

Using the argument fixedGaps, one can make sure that certain edges will not be present in the resulting essential graph: if the entry [i, j] of the matrix passed to fixedGaps is TRUE, there will be no edge between nodes \(i\) and \(j\). Using this argument can speed up the execution of GIES and allows the user to account for previous knowledge or other constraints. The argument adaptive can be used to relax the constraints encoded by fixedGaps according to a modification of GES called ARGES (adaptively restricted greedy equivalence search) which has been presented in Nandy, Hauser and Maathuis (2015):

When adaptive = "vstructures" and the algorithm introduces a new v-structure \(a \longrightarrow b \longleftarrow c\) in the forward phase, then the edge \(a - c\) is removed from the list of fixed gaps, meaning that the insertion of an edge between \(a\) and \(c\) becomes possible even if it was forbidden by the initial matrix passed to fixedGaps.
When adaptive = "triples" and the algorithm introduces a new unshielded triple in the forward phase (i.e., a subgraph of three nodes \(a\), \(b\) and \(c\) where \(a\) and \(b\) as well as \(b\) and \(c\) are adjacent, but \(a\) and \(c\) are not), then the edge \(a - c\) is removed from the list of fixed gaps.

With one of the adaptive modifications, the successive application of a skeleton estimation method and GES restricted to an estimated skeleton still gives a consistent estimator of the DAG, which is not the case without the adaptive modification.

References

D.M. Chickering (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research 3, 507--554

A. Hauser and P. Bühlmann (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. Journal of Machine Learning Research 13, 2409--2464.

P. Nandy, A. Hauser and M. Maathuis (2015). Understanding consistency in hybrid causal structure learning. arXiv preprint 1507.02608

P. Spirtes, C.N. Glymour, and R. Scheines (2000). Causation, Prediction, and Search, MIT Press, Cambridge (MA).

Examples

Run this code

## Load predefined data
data(gmG)

## Define the score (BIC)
score <- new("GaussL0penObsScore", gmG8$x)

## Estimate the essential graph
ges.fit <- ges(score)

## Plot the estimated essential graph and the true DAG
if (require(Rgraphviz)) {
  par(mfrow=c(1,2))
  plot(ges.fit$essgraph, main = "Estimated CPDAG")
  plot(gmG8$g, main = "True DAG")
  str(ges.fit, max=2)
}
## alternative:
if (require(Matrix)) {
  as(as(ges.fit$essgraph,"graphNEL"),"Matrix")
}

Run the code above in your browser using DataLab