Learn R Programming

bnlearn (version 4.9.1)

score: Score of the Bayesian network

Description

Compute the score of the Bayesian network.

Usage

# S4 method for bn
score(x, data, type = NULL, ..., by.node = FALSE, debug = FALSE)
# S4 method for bn.naive
score(x, data, type = NULL, ..., by.node = FALSE, debug = FALSE)
# S4 method for bn.tan
score(x, data, type = NULL, ..., by.node = FALSE, debug = FALSE)

# S3 method for bn logLik(object, data, ...) # S3 method for bn AIC(object, data, ..., k = 1) # S3 method for bn BIC(object, data, ...)

Value

For score() with by.node = TRUE, a vector of numeric values, the individual node contributions to the score of the Bayesian network. Otherwise, a single numeric value, the score of the Bayesian network.

Arguments

x, object

an object of class bn.

data

a data frame containing the data the Bayesian network that will be used to compute the score.

type

a character string, the label of a network score. If none is specified, the default score is the Bayesian Information Criterion for both discrete and continuous data sets. See network scores for details.

by.node

a boolean value. If TRUE and the score is decomposable, the function returns the score terms corresponding to each node; otherwise it returns their sum (the overall score of x).

debug

a boolean value. If TRUE a lot of debugging output is printed; otherwise the function is completely silent.

...

extra arguments from the generic method (for the AIC and logLik functions, currently ignored) or additional tuning parameters (for the score function).

k

a numeric value, the penalty coefficient to be used; the default k = 1 gives the expression used to compute the AIC in the context of scoring Bayesian networks.

Author

Marco Scutari

Details

Additional arguments of the score() function:

  • iss: the imaginary sample size used by the Bayesian Dirichlet scores (bde, mbde, bds, bdj). It is also known as “equivalent sample size”. The default value is equal to 1.

  • iss.mu: the imaginary sample size for the normal component of the normal-Wishart prior in the Bayesian Gaussian score (bge). The default value is 1.

  • iss.w: the imaginary sample size for the Wishart component of the normal-Wishart prior in the Bayesian Gaussian score (bge). The default value is ncol(data) + 2.

  • nu: the mean vector of the normal component of the normal-Wishart prior in the Bayesian Gaussian score (bge). The default value is equal to colMeans(data).

  • l: the number of scores to average in the locally averaged Bayesian Dirichlet score (bdla). The default value is 5.

  • exp: a list of indexes of experimental observations (those that have been artificially manipulated). Each element of the list must be named after one of the nodes, and must contain a numeric vector with indexes of the observations whose value has been manipulated for that node.

  • k: the penalty coefficient to be used by the AIC, BIC and penalized node-average log-likelihood scores. The default value is 1 for AIC, log(nrow(data)) / 2 for BIC and 1 / nnnodes(x) * nrow(data) ^ -0.25 for the node-average log-likelihood scores.

  • gamma: the additional penalty in the extended BIC scores. The default value is 0.5.

  • prior: the prior distribution to be used with the various Bayesian Dirichlet scores (bde, mbde, bds, bdj, bdla) and the Bayesian Gaussian score (bge). Possible values are:

    • uniform (the default).

    • vsp: the Bayesian variable selection prior, which puts a probability of inclusion on parents.

    • marginal: an independent marginal uniform for each arc.

    • cs: the Castelo & Siebes prior, which puts an independent prior probability on each arc and direction).

  • beta: the parameter associated with prior.

    • If prior is uniform, beta is ignored.

    • If prior is vsp, beta is the probability of inclusion of an additional parent. The default is 1/ncol(data).

    • If prior is marginal, beta is the probability of inclusion of an arc. Each direction has a probability of inclusion of beta / 2 and the probability that the arc is not included is therefore 1 - beta. The default value is 0.5, so that arc inclusion and arc exclusion have the same probability.

    • If prior is cs, beta is a data frame with columns from, to and prob specifying the prior probability for a set of arcs. A uniform probability distribution is assumed for the remaining arcs.

  • newdata: the test set whose predictive likelihood will be computed by pred-loglik, pred-loglik-g or pred-loglik-cg. It should be a data frame with the same variables as data.

  • fun: the function that computes the score component for a single node in the custom score. fun must have arguments node, parents, data and args, in this order; in other words, it must have signature function(node, parents, data, args). node will contain the label of the node to be scored (a character string); parents will contain the labels of its parents (a character vector); data will contain the complete data sets, with all the variables (a data frame); and args will be a list containing any additional arguments to the score.

  • args: a list containing the optional arguments to fun, for tuning custom score functions.

See Also

network scores, arc.strength, alpha.star.

Examples

Run this code
data(learning.test)
dag = hc(learning.test)
score(dag, learning.test, type = "bde")

## let's see score equivalence in action!
dag2 = set.arc(dag, "B", "A")
score(dag2, learning.test, type = "bde")

## K2 score on the other hand is not score equivalent.
score(dag, learning.test, type = "k2")
score(dag2, learning.test, type = "k2")

## BDe with a prior.
beta = data.frame(from = c("A", "D"), to = c("B", "F"),
         prob = c(0.2, 0.5), stringsAsFactors = FALSE)
score(dag, learning.test, type = "bde", prior = "cs", beta = beta)

## equivalent to logLik(dag, learning.test)
score(dag, learning.test, type = "loglik")

## equivalent to AIC(dag, learning.test)
score(dag, learning.test, type = "aic")

## custom score, computing BIC manually.
my.bic = function(node, parents, data, args) {

  n = nrow(data)

  if (length(parents) == 0) {

    counts = table(data[, node])
    nparams = dim(counts) - 1
    sum(counts * log(counts / n)) - nparams * log(n) / 2

  }#THEN
  else {

    counts = table(data[, node], configs(data[, parents, drop = FALSE]))
    nparams = ncol(counts) * (nrow(counts) - 1)
    sum(counts * log(prop.table(counts, 2))) - nparams * log(n) / 2

  }#ELSE

}#MY.BIC
score(dag, learning.test, type = "custom", fun = my.bic, by.node = TRUE)
score(dag, learning.test, type = "bic", by.node = TRUE)

Run the code above in your browser using DataLab