Function to perform the indirect bagging and subagging.
# S3 method for data.frame
inbagg(formula, data, pFUN=NULL,
cFUN=list(model = NULL, predict = NULL, training.set = NULL),
nbagg = 25, ns = 0.5, replace = FALSE, ...)
An object of class "inbagg"
, that is a list with elements
a list of length nbagg
, describing the prediction
models corresponding
to each bootstrap sample. Each element of mtrees
is a list with elements bindx
(observations of bag sample),
btree
(classifying function of bag sample) and bfct
(predictive models for intermediates of bag sample).
vector of response values.
data frame of intermediate variables.
data frame of explanatory variables.
formula. A formula
specified as y~w1+w2+w3~x1+x2+x3
describes how to model the intermediate variables w1, w2, w3
and the response variable y
, if no other formula is specified by the elements of pFUN
or in cFUN
data frame of explanatory, intermediate and response variables.
list of lists, which describe models for the intermediate variables, details are given below.
either a fixed function with argument newdata
and returning the class membership by default, or a list specifying a classifying model, similar to one element of pFUN
. Details are given below.
number of bootstrap samples.
proportion of sample to be drawn from the learning sample. By default, subagging with 50% is performed, i.e. draw 0.5*n out of n without replacement.
logical. Draw with or without replacement.
additional arguments (e.g. subset
).
A given data set is subdivided into three types of variables: explanatory, intermediate and response variables.
Here, each specified intermediate variable is modelled separately
following pFUN
, a list of lists with elements specifying an
arbitrary number of models for the intermediate variables and an
optional element training.set = c("oob", "bag", "all")
. The
element training.set
determines whether, predictive models for
the intermediate are calculated based on the out-of-bag sample
("oob"
), the default, on the bag sample ("bag"
) or on all
available observations ("all"
). The elements of pFUN
,
specifying the models for the intermediate variables are lists as
described in inclass
.
Note that, if no formula is given in these elements, the functional
relationship of formula
is used.
The response variable is modelled following cFUN
.
This can either be a fixed classifying function as described in Peters
et al. (2003) or a list,
which specifies the modelling technique to be applied. The list
contains the arguments model
(which model to be fitted),
predict
(optional, how to predict), formula
(optional, of
type y~w1+w2+w3+x1+x2
determines the variables the classifying
function is based on) and the optional argument training.set =
c("fitted.bag", "original", "fitted.subset")
specifying whether the classifying function is trained on the predicted
observations of the bag sample ("fitted.bag"
),
on the original observations ("original"
) or on the
predicted observations not included in a defined subset
("fitted.subset"
). Per default the formula specified in
formula
determines the variables, the classifying function is
based on.
Note that the default of cFUN = list(model = NULL, training.set = "fitted.bag")
uses the function rpart
and
the predict function predict(object, newdata, type = "class")
.
David J. Hand, Hua Gui Li, Niall M. Adams (2001), Supervised classification with structured class definitions. Computational Statistics & Data Analysis 36, 209--225.
Andrea Peters, Berthold Lausen, Georg Michelson and Olaf Gefeller (2003), Diagnosis of glaucoma by indirect classifiers. Methods of Information in Medicine 1, 99-103.
library("MASS")
library("rpart")
y <- as.factor(sample(1:2, 100, replace = TRUE))
W <- mvrnorm(n = 200, mu = rep(0, 3), Sigma = diag(3))
X <- mvrnorm(n = 200, mu = rep(2, 3), Sigma = diag(3))
colnames(W) <- c("w1", "w2", "w3")
colnames(X) <- c("x1", "x2", "x3")
DATA <- data.frame(y, W, X)
pFUN <- list(list(formula = w1~x1+x2, model = lm, predict = mypredict.lm),
list(model = rpart))
inbagg(y~w1+w2+w3~x1+x2+x3, data = DATA, pFUN = pFUN)
Run the code above in your browser using DataLab