bn.cv(data, bn, loss = NULL, ..., algorithm.args = list(),
loss.args = list(), fit = "mle", fit.args = list(), method = "k-fold",
cluster = NULL, debug = FALSE)# S3 method for bn.kcv
plot(x, ..., main, xlab, ylab, connect = FALSE)
# S3 method for bn.kcv.list
plot(x, ..., main, xlab, ylab, connect = FALSE)
bn
(a fixed network structure).loss
.bn.fit
for details.bn.fit
for details.k-fold
, custom-folds
or hold-out
. See below for details.parallel integration
for details and a simple example.TRUE
a lot of debugging output is
printed; otherwise the function is completely silent.bn.kcv
or bn.kcv.list
returned by
bn.cv
.bn.kcv
or bn.kcv.list
to plot alongside the first.bn.kcv.list
if runs
is at least 2, an object
of class bn.kcv
if runs
is equal to 1.data
are split in k
subsets of equal
size. For each subset in turn, bn
is fitted (and possibly learned
as well) on the other k - 1
subsets and the loss function is then
computed using that subset. Loss estimates for each of the k
subsets are then combined to give an overall loss for data
.
k
subsamples of size m
are sampled
independently without replacement from the data
. For each subsample,
bn
is fitted (and possibly learned) on the remaining
m - nrow(data)
samples and the loss function is computed on the
m
observations in the subsample. The overall loss estimate is the
average of the k
loss estimates from the subsamples.
runs
, the overall
loss is the averge of the loss estimates from the different runs. To clarify, cross-validation methods accept the following optional arguments:
k
: a positive integer number, the number of groups into which the
data will be split (in k-fold cross-validation) or the number of times
the data will be split in training and test samples (in hold-out
cross-validation).
m
: a positive integer number, the size of the test set in
hold-out cross-validation.
runs
: a positive integer number, the number of times
k-fold or hold-out cross-validation will be run.
folds
: a list in which element corresponds to one fold and
contains the indices for the observations that are included to that fold. logl
): also known as negative
entropy or negentropy, it is the negated expected log-likelihood
of the test set for the Bayesian network fitted from the training set.
logl-g
): the negated expected
log-likelihood for Gaussian Bayesian networks.
pred
): the prediction error
for a single node in a discrete network. Frequentist predictions are used,
so the values of the target node are predicted using only the information
present in its local distribution (from its parents).
pred-lw
and
pred-lw-cg
): similar to the above, but predictions are computed
from an arbitrary set of nodes using likelihood weighting to obtain
Bayesian posterior estimates. pred-lw
applies to discrete Bayesian
networks, pred-lw-cg
to (discrete nodes in) hybrid networks.
cor
): the correlation
between the observed and the predicted values for a single node in a
Gaussian Bayesian network.
cor-lw
and
cor-lw-cg
): similar to the above, but predictions are computed from
an arbitrary set of nodes using likelihood weighting to obtain Bayesian
posterior estimates. cor-lw
applies to Gaussian networks and
cor-lw-cg
to (continuous nodes in) hybrid networks.
mse
): the mean squared error
between the observed and the predicted values for a single node in a
Gaussian Bayesian network.
mse-lw
and
mse-lw-cg
): similar to the above, but predictions are computed from
an arbitrary set of nodes using likelihood weighting to obtain Bayesian
posterior estimates. mse-lw
applies to Gaussian networks and
mse-lw-cg
to (continuous nodes in) hybrid networks. loss.args
are: target
: a character string, the label of target node for
prediction in all loss functions but logl
, logl-g
and
logl-cg
.
from
: a vector of character strings, the labels of the nodes
used to predict the target
node in pred-lw
, pred-lw-cg
,
cor-lw
, cor-lw-cg
, mse-lw
and mse-lw-cg
. The
default is to use all the other nodes in the network. Loss functions
pred
, cor
and mse
implicitly predict only from the
parents of the target
node.
n
: a positive integer, the number of particles used by
likelihood weighting for pred-lw
, pred-lw-cg
, cor-lw
,
cor-lw-cg
, mse-lw
and mse-lw-cg
.
The default value is 500
. bn
is a Bayesian network classifier, pred
and
pred-lw
both give exact posterior predictions computed using the
closed-form formulas for naive Bayes and TAN.bn.kcv
or
bn.kcv.list
(the first as the x
argument, the remaining as the
...
argument) and plot the respected expected loss values side by side.
For a bn.kcv
object, this mean a single point; for a bn.kcv.list
object this means a boxplot.bn.boot
, rbn
, bn.kcv-class
.bn.cv(learning.test, 'hc', loss = "pred", loss.args = list(target = "F"))
folds = list(1:2000, 2001:3000, 3001:5000)
bn.cv(learning.test, 'hc', loss = "logl", method = "custom-folds",
folds = folds)
bn.cv(gaussian.test, 'mmhc', method = "hold-out", k = 5, m = 50, runs = 2)
## Not run: ------------------------------------
# gaussian.subset = gaussian.test[1:50, ]
# cv.gs = bn.cv(gaussian.subset, 'gs', runs = 10)
# cv.iamb = bn.cv(gaussian.subset, 'iamb', runs = 10)
# cv.inter = bn.cv(gaussian.subset, 'inter.iamb', runs = 10)
# plot(cv.gs, cv.iamb, cv.inter,
# xlab = c("Grow-Shrink", "IAMB", "Inter-IAMB"), connect = TRUE)
## ---------------------------------------------
Run the code above in your browser using DataLab