score: A function to learn a DAG model by the hill climbing algorithm

Description

This function is to efficiently implement the hill climbing algorithm for learning a DAG by optimizing a score. It can also build an emsemble of DAGs based on bootstrap resamples of the data.

Usage

score(Y, n.boot=0, score.type="BIC", threshold=0, max.step=500,  ini.adj.matrix=NULL, 
blacklist=NULL, whitelist=NULL, standardize=TRUE,  standardize.boot=TRUE, 
random.forest=FALSE, random.step.length=NULL, 
nrestart=0, perturb=0, shuffle=FALSE, print=FALSE, EPS=1e-06)

Arguments

an n by p data matrix: n -- sample size, p -- number of variables

n.boot

an integer: the number of bootstrap resamples of the data matrix Y, default = 0, meaning no bootstrapping

score.type

a string: "BIC" or "likelihood", default = "BIC"

threshold

a nonnegative scalar: the cutoff value for the change of the score to decide whether to stop the search; default = 0, meaning stop search when score is not improved

max.step

an integer: the maximum number of search steps of the hill climbing algorithm, default = 500

ini.adj.matrix

a p by p 0-1 matrix: the initial graph, default = NULL, meaning the empty graph

blacklist

a p by p 0-1 matrix: if the (i,j)th-entry is "1", then the edge i-->j will be excluded from the DAG during the search, default= NULL

whitelist

a p by p 0-1 matrix: if the (i,j)th-entry is "1", then the edge i-->j will always be included in the DAG during the search, default=NULL

standardize

logic: whether to standardize the data to have mean zero and sd one, default = TRUE

standardize.boot

logic: whether to standardize the bootstrap resamples, default = TRUE

random.forest

logic: whether to use the "random forest" idea for further variance reduction, default=FALSE

random.step.length

a vector: specify ``random forest" steps

nrestart

an integer: number of times to restart the search algorithm after a local optimal is achieved. The purpose is to search for global optimal, default=0, meaning no restart.

perturb

an integer: how many random addition/deletion/reversal operations should be used in each random restart, default = 0, corresponding to no restart.

shuffle

logic: whether to shuffle the order of variables before DAG learning. The purpose is to avoid potential systematic biases in simulation studies (due to possible coincidence of the topological ordering and the nominal ordering of the variables), default=FALSE

logic: whether print the step information, default= FALSE

EPS

a scalar: a number to indicate what small values will be treated as zero, default = 1e-06, meaning that values with magnitude smaller than 1e-6 will be treated as zero.

Value

adj.matrix: adjacency matrix for the estimated DAG
final.step: a number recording how many search steps are conducted before the procedure stops
movement: a matrix recording the selected operation, \ addition, deletion or reversal of an edge, at each search step
delta.min: a vector recording the change of score due to the selected operation at each search step
bic.ok: a vector recording whether there is error in score calculation at each step
adj.matrix: adjacency matrices for the estimated DAGs based on n.boot bootstrap resamples

Details

score implements the hill climbing algorithm. It can be used to build an ensemble of DAGs (in form of adjacency matrices) based on bootstrap resamples of the data.

References

Wang, R. and Peng, J. (2013) Learning directed acyclic graphs via bootstrap aggregating. arXiv:1406.2098

Examples

Run this code

  data(example)
  Y.n=example$Y  # data matrix
  true.dir=example$true.dir  # adjacency matrix of the data generating DAG
  true.ske=example$true.ske  # skeleton graph of the data generating DAG
  
  
  temp=score(Y=Y.n, n.boot=0, score.type="BIC") ## learn DAG using "BIC" score
  adj=temp$adj.matrix
  
  sum(adj)  # number of edges 
  
  
  # Find the total number of skeleton edges and the number of correct skeleton edges 
  # from the adjacency matrix  of an estimated DAG.
  # Then compared with the true.ske.
  
    diag(adj)=0
    tt=adj+t(adj) ##symmetrization
    correct.c=sum((tt>0)&(true.ske>0))/2    
    total.c=sum(tt>0)/2
    total.true=sum(true.ske>0)/2
    c(total.c, correct.c, total.true)

 # Build an ensemble of DAGs based on bootstrap resamples of the data 
  temp.boot=score(Y.n, n.boot=10, score.type="BIC")
  boot.adj=temp.boot$adj.matrix

Run the code above in your browser using DataLab