impute: Predict or Impute Missing Data from a Bayesian Network

Description

Impute missing values in a data set from a Bayesian network.

Usage

# S3 method for bn.fit
predict(object, node, data, method = "parents", ..., prob = FALSE,
  debug = FALSE)
impute(object, data, method = "parents", ..., debug = FALSE)

Arguments

object

an object of class bn.fit for impute; or an object of class bn or bn.fit for predict.

data

a data frame containing the data to be imputed. Complete observations will be ignored.

node

a character string, the label of a node.

method

a character string, the method used to impute the missing values or predict new ones.

…

additional arguments for the imputation method. See below.

prob

a boolean value. If TRUE and object is a discrete network, the probabilities used for prediction are attached to the predicted values as an attribute called prob.

debug

a boolean value. If TRUE a lot of debugging output is printed; otherwise the function is completely silent.

Value

predict returns a numeric vector (for Gaussian and conditional Gaussian nodes), a factor (for categorical nodes) or an ordered factor (for ordinal nodes). If prob = TRUE and the network is discrete, the probabilities used for prediction are attached to the predicted values as an attribute called prob.

impute returns a data frame with the same structure as data.

Details

predict returns the predicted values for node given the data specified by data and the fitted network. Depending on the value of method, the predicted values are computed as follows.

parents: the predicted values are computed by plugging in the new values for the parents of node in the local probability distribution of node extracted from fitted.
bayes-lw: the predicted values are computed by averaging likelihood weighting simulations performed using all the available nodes as evidence (obviously, with the exception of the node whose values we are predicting). The number of random samples which are averaged for each new observation is controlled by the n optional argument; the default is 500. If the variable being predicted is discrete, the predicted level is that with the highest conditional probability. If the variable is continuous, the predicted value is the expected value of the conditional distribution. The variables that are used to compute the predicted values can be specified with the from optional argument; the default is to use all the relevant variables from the data.

impute is based on predict, and can impute missing values with the same methods (parents and bayes-lw). The latter can take an additional argument n with the number of random samples which are averaged for each observation.

Examples

Run this code

with.missing.data = gaussian.test
with.missing.data[sample(nrow(with.missing.data), 500), "F"] = NA
fitted = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
           gaussian.test)
imputed = impute(fitted, with.missing.data)

training = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
           gaussian.test[1:2000, ])
test = gaussian.test[2001:nrow(gaussian.test), ]
predicted = predict(training, node = "F", data = test)

Run the code above in your browser using DataLab