The PCHC Bayesian network learning algorithm: The PCHC Bayesian network learning algorithm

Description

The PCHC Bayesian network learning algorithm.

Usage

pchc(x, method = "pearson", alpha = 0.05, robust = FALSE, ini.stat = NULL,
R = NULL, restart = 10, score = "bic-g", blacklist = NULL, whitelist = NULL)

Arguments

A numerical matrix with the variables. If you have a data.frame (i.e. categorical data) turn them into a matrix using data.frame.to_matrix. Note, that for the categorical case data, the numbers must start from 0. No missing data are allowed.

method

If you have continuous data, you can choose either "pearson" or "spearman". If you have categorical data though, this must be "cat". In this case, make sure the minimum value of each variable is zero. The g2test and the relevant functions work that way.

alpha

The significance level for assessing the p-values.

robust

Do you want outliers to be removed prior to applying the PCHC algorithm? If yes, set this to TRUE to utilise the MCD.

ini.stat

If the initial test statistics (univariate associations) are available, pass them through this parameter.

If the correlation matrix is available, pass it here.

restart

An integer, the number of random restarts.

score

A character string, the label of the network score to be used in the algorithm. If none is specified, the default score is the Bayesian Information Criterion for both discrete and continuous data sets. The available score for continuous variables are: "bic-g" (default), "loglik-g", "aic-g", "bic-g" or "bge". The available score categorical variables are: "bde", "loglik" or "bic".

blacklist

A data frame with two columns (optionally labeled "from" and "to"), containing a set of arcs not to be included in the graph.

whitelist

A data frame with two columns (optionally labeled "from" and "to"), containing a set of arcs to be included in the graph.

Value

A list including:

ini

A list including the output of the pchc.skel function.

dag

A "bn" class output. A list including the outcome of the Hill-Climbing phase. See the package "bnlearn" for more details.

scoring

The score value.

runtime

The duration of the algorithm.

Details

The PC algorithm as proposed by Spirtes et al. (2000) is first implemented followed by a scoring phase, such as hill climbing.

References

Tsagris M. (2021). A new scalable Bayesian network learning algorithm with applications to economics. Computational Economics (Accepted for publication).

Spirtes P., Glymour C. and Scheines R. (2001). Causation, Prediction, and Search. The MIT Press, Cambridge, MA, USA, 3nd edition.

Tsamardinos I. and Borboudakis G. (2010) Permutation Testing Improves Bayesian Network Learning. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. 322-337.

Tsamardinos I., Brown E.L. and Aliferis F.C. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning 65(1): 31-78.

Examples

Run this code

# NOT RUN {
# simulate a dataset with continuous data
x <- matrix( rnorm(400 * 30, 1, 10), nrow = 400 )
a <- pchc(x)
# }

Run the code above in your browser using DataLab