Learn R Programming

dagbag (version 1.1)

score: A function to learn a DAG model by the hill climbing algorithm

Description

This function is to efficiently implement the hill climbing algorithm for learning a DAG by optimizing a score. It can also build an emsemble of DAGs based on bootstrap resamples of the data.

Usage

score(Y, n.boot=0, score.type="BIC", threshold=0, max.step=500, ini.adj.matrix=NULL, blacklist=NULL, whitelist=NULL, standardize=TRUE, standardize.boot=TRUE, random.forest=FALSE, random.step.length=NULL, nrestart=0, perturb=0, shuffle=FALSE, print=FALSE, EPS=1e-06)

Arguments

Y
an n by p data matrix: n -- sample size, p -- number of variables
n.boot
an integer: the number of bootstrap resamples of the data matrix Y, default = 0, meaning no bootstrapping
score.type
a string: "BIC" or "likelihood", default = "BIC"
threshold
a nonnegative scalar: the cutoff value for the change of the score to decide whether to stop the search; default = 0, meaning stop search when score is not improved
max.step
an integer: the maximum number of search steps of the hill climbing algorithm, default = 500
ini.adj.matrix
a p by p 0-1 matrix: the initial graph, default = NULL, meaning the empty graph
blacklist
a p by p 0-1 matrix: if the (i,j)th-entry is "1", then the edge i-->j will be excluded from the DAG during the search, default= NULL
whitelist
a p by p 0-1 matrix: if the (i,j)th-entry is "1", then the edge i-->j will always be included in the DAG during the search, default=NULL
standardize
logic: whether to standardize the data to have mean zero and sd one, default = TRUE
standardize.boot
logic: whether to standardize the bootstrap resamples, default = TRUE
random.forest
logic: whether to use the "random forest" idea for further variance reduction, default=FALSE
random.step.length
a vector: specify ``random forest" steps
nrestart
an integer: number of times to restart the search algorithm after a local optimal is achieved. The purpose is to search for global optimal, default=0, meaning no restart.
perturb
an integer: how many random addition/deletion/reversal operations should be used in each random restart, default = 0, corresponding to no restart.
shuffle
logic: whether to shuffle the order of variables before DAG learning. The purpose is to avoid potential systematic biases in simulation studies (due to possible coincidence of the topological ordering and the nominal ordering of the variables), default=FALSE
print
logic: whether print the step information, default= FALSE
EPS
a scalar: a number to indicate what small values will be treated as zero, default = 1e-06, meaning that values with magnitude smaller than 1e-6 will be treated as zero.

Value

If n.boot=0 (no bootstrap), then a list with five components:
adj.matrix
adjacency matrix for the estimated DAG
final.step
a number recording how many search steps are conducted before the procedure stops
movement
a matrix recording the selected operation, \ addition, deletion or reversal of an edge, at each search step
delta.min
a vector recording the change of score due to the selected operation at each search step
bic.ok
a vector recording whether there is error in score calculation at each step
If n.boot>0 (with bootstrap) then a list with one component:
adj.matrix
adjacency matrices for the estimated DAGs based on n.boot bootstrap resamples

Details

score implements the hill climbing algorithm. It can be used to build an ensemble of DAGs (in form of adjacency matrices) based on bootstrap resamples of the data.

References

Wang, R. and Peng, J. (2013) Learning directed acyclic graphs via bootstrap aggregating. arXiv:1406.2098

Examples

Run this code
  data(example)
  Y.n=example$Y  # data matrix
  true.dir=example$true.dir  # adjacency matrix of the data generating DAG
  true.ske=example$true.ske  # skeleton graph of the data generating DAG
  
  
  temp=score(Y=Y.n, n.boot=0, score.type="BIC") ## learn DAG using "BIC" score
  adj=temp$adj.matrix
  
  sum(adj)  # number of edges 
  
  
  # Find the total number of skeleton edges and the number of correct skeleton edges 
  # from the adjacency matrix  of an estimated DAG.
  # Then compared with the true.ske.
  
    diag(adj)=0
    tt=adj+t(adj) ##symmetrization
    correct.c=sum((tt>0)&(true.ske>0))/2    
    total.c=sum(tt>0)/2
    total.true=sum(true.ske>0)/2
    c(total.c, correct.c, total.true)

 # Build an ensemble of DAGs based on bootstrap resamples of the data 
  temp.boot=score(Y.n, n.boot=10, score.type="BIC")
  boot.adj=temp.boot$adj.matrix

Run the code above in your browser using DataLab