bn.cv: Cross-validation for Bayesian networks

Description

Perform a k-fold cross-validation for a learning algorithm or a fixed network structure.

Usage

bn.cv(data, bn, loss = NULL, k = 10, algorithm.args = list(),
  loss.args = list(), fit = "mle", fit.args = list(),
  cluster = NULL, debug = FALSE)

Arguments

data

a data frame containing the variables in the model.

either a character string (the label of the learning algorithm to be applied to the training data in each iteration) or an object of class bn (a fixed network structure).

loss

a character string, the label of a loss function. If none is specified, the default loss function is the Log-Likelihood Loss for both discrete and continuous data sets. See below for additional details.

a positive integer number, the number of groups into which the data will be split.

algorithm.args

a list of extra arguments to be passed to the learning algorithm.

loss.args

a list of extra arguments to be passed to the loss function specified by loss.

fit

a character string, the label of the method used to fit the parameters of the newtork. See bn.fit for details.

fit.args

additional arguments for the parameter estimation prcoedure, see again bn.fit for details..

cluster

an optional cluster object from package snow. See snow integration for details and a simple example.

debug

a boolean value. If TRUE a lot of debugging output is printed; otherwise the function is completely silent.

Value

An object of class bn.kcv.

Details

The following loss functions are implemented:

Log-Likelihood Loss(logl): also known asnegative entropyornegentropy, it's the negated expected log-likelihood of the test set for the Bayesian network fitted from the training set.
Gaussian Log-Likelihood Loss(logl-g): the negated expected log-likelihood for Gaussian Bayesian networks.
Classification Error(pred): theprediction errorfor a single node (specified by thetargetparameter inloss.args) in a discrete network.

References

Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.

Examples

Run this code

bn.cv(learning.test, 'hc', loss = "pred", loss.args = list(target = "F"))
#
#  k-fold cross-validation for Bayesian networks
#
#  target learning algorithm:             Hill-Climbing
#  number of subsets:                     10
#  loss function:                         Classification Error
#  expected loss:                         0.509
#
bn.cv(gaussian.test, 'mmhc')
#
#  k-fold cross-validation for Bayesian networks
#
#  target learning algorithm:             Max-Min Hill Climbing
#  number of subsets:                     10
#  loss function:                         Log-Likelihood Loss (Gaussian)
#  expected loss:                         10.63062
#

Run the code above in your browser using DataLab