bn.cv: Cross-validation for Bayesian networks

Description

Perform a k-fold cross-validation for a learning algorithm or a fixed network structure.

Usage

bn.cv(data, bn, loss = NULL, k = 10, algorithm.args = list(),
  loss.args = list(), fit = "mle", fit.args = list(),
  cluster = NULL, debug = FALSE)

Arguments

data

a data frame containing the variables in the model.

either a character string (the label of the learning algorithm to be applied to the training data in each iteration) or an object of class bn (a fixed network structure).

loss

a character string, the label of a loss function. If none is specified, the default loss function is the Classification Error for Bayesian networks classifiers; otherwise, the Log-Likelihood Loss for both discrete and con

a positive integer number, the number of groups into which the data will be split.

algorithm.args

a list of extra arguments to be passed to the learning algorithm.

loss.args

a list of extra arguments to be passed to the loss function specified by loss.

fit

a character string, the label of the method used to fit the parameters of the newtork. See bn.fit for details.

fit.args

additional arguments for the parameter estimation prcoedure, see again bn.fit for details..

cluster

an optional cluster object from package parallel. See parallel integration for details and a simple example.

debug

a boolean value. If TRUE a lot of debugging output is printed; otherwise the function is completely silent.

Value

An object of class bn.kcv.

Details

The following loss functions are implemented:

Log-Likelihood Loss(logl): also known asnegative entropyornegentropy, it is the negated expected log-likelihood of the test set for the Bayesian network fitted from the training set.
Gaussian Log-Likelihood Loss(logl-g): the negated expected log-likelihood for Gaussian Bayesian networks.
Classification Error(pred): theprediction errorfor a single node (specified by thetargetparameter inloss.args) in a discrete network.
Predictive Correlation(cor): thecorrelationbetween the observed and the predicted values for a single node (specified by thetargetparameter inloss.args) in a Gaussian Bayesian network.
Mean Squared Error(mse): themean squared errorbetween the observed and the predicted values for a single node (specified by thetargetparameter inloss.args) in a Gaussian Bayesian network.

References

Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.

Examples

Run this code

bn.cv(learning.test, 'hc', loss = "pred", loss.args = list(target = "F"))

bn.cv(gaussian.test, 'mmhc')

Run the code above in your browser using DataLab