Learn R Programming

ipred (version 0.4-6)

errorest: Estimators for the Prediction Error

Description

Resampling based estimates of prediction error (misclassification or mean squared error).

Usage

errorest(formula, data, subset, na.action, model=NULL, predict=NULL,
         iclass=NULL, estimator=c("cv", "boot", "632plus"), 
         est.para=list(k = 10, nboot = 25), ...)

Arguments

formula
formula. Either describing the model of explanatory and response variables in the usual way (see lm) or the model between explanatory and intermediate variables
data
data frame containing the variables in the model formula and additionally the class membership variable if model = inclass. data is required for indirect classification, otherwise
subset
optional vector, specifying a subset of observations to be used.
na.action
function. Indicates what should happen when the data contain NAs.
model
function. Modelling technique whose error rate is to be estimated. The parameter na.action is evaluated in the modelling process.
predict
function. Prediction method to be used. The vector of predicted values must have the same length as the the number of to-be-predicted observations. Predictions corresponding to missing data must be rep
iclass
character. Specifying the class membership variable (response) in data in the framework of indirect classification.
estimator
estimator of the misclassification error: cv cross-validation, boot bootstrap or 632plus bias corrected bootstrap (classification only).
est.para
a list of additional parameters for the estimator. k for k-fold cross-validations or nboot for the number of bootstrap replications.
...
additional parameters to model.

Value

  • An object of class errorest, i.e. a list with arguments:
  • errestimated misclassification error for a nominal or the square root of the estimated mean squared error for a continuous response.
  • estimatorkind of estimator used.
  • paraadditional parameters for the estimator.
  • data.namenames of the variables used.
  • classlogical. TRUE for classification problems.
  • sdjackknife estimate of internal standard deviation of err (if estimator = "boot").

Details

The prediction error for classification and regression models using cross-validation or the bootstrap can be computed by errorest. Any model can be specified as long as this is a function with arguments model(formula, data, subset, na.action, ...). If model is generic and a model.predict(object, newdata, ...) is available, predict does not need to be specified. However, predict has to return predicted values directly comparable to the responses. See the examples below.

k-fold cross-validation and the usual bootstrap estimator with est.para$nboot bootstrap replications can be computed for classification and regression problems. The bias corrected .632+ bootstrap by Efron and Tibshirani (1997) is available for classification problems only.

print.errorest is available for inspection of the results.

References

Bradley Efron and Robert Tibshirani (1997), Improvements on Cross-Validation: The .632+ Bootstrap Estimator. Journal of the American Statistical Association 92(438), 548--560.

Brian D. Ripley (1996), Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press.

David J. Hand, Hua Gui Li, Niall M. Adams (2001), Supervised classification with structured class definitions. Computational Statistics & Data Analysis 36, 209--225.

Examples

Run this code
X <- as.data.frame(matrix(rnorm(1000), ncol=10))
y <- factor(ifelse(apply(X, 1, mean) > 0, 1, 0))
learn <- cbind(y, X)

mypredict.lda <- function(object, newdata)
  predict(object, newdata = newdata)$class

errorest(y ~ ., data= learn, model=lda, 
         estimator = "cv", predict= mypredict.lda)

# n-fold cv = leave-one-out.

errorest(y ~ ., data= learn, model=lda, 
         estimator = "cv", est.para=list(k = nrow(learn)), 
         predict= mypredict.lda)

errorest(y ~ ., data= learn, model=lda, 
         estimator = "boot", predict= mypredict.lda)

errorest(y ~ ., data= learn, model=lda, 
         estimator = "632plus", predict= mypredict.lda)

attach(learn)
errorest(y ~ V1 + V2 + V3, model=lda, estimator = "cv",
         predict= mypredict.lda)
detach(learn)


mypredict.rpart <- function(object, newdata)
  predict(object, newdata = newdata,type="class")

errorest(y ~ ., data= learn, model=rpart, estimator = "cv",
         predict=mypredict.rpart)

errorest(y ~ ., data= learn, model=rpart, estimator = "boot",
predict=mypredict.rpart)

errorest(y ~ ., data= learn, model=rpart, estimator = "632plus",
predict=mypredict.rpart)

errorest(y ~ ., data= learn, model=bagging, estimator = "cv",
nbagg=10)

data(Glass)

# LDA has cross-validated misclassification error of 
# 38\% (Ripley, 1996, page 98)


# Pruned trees about 32\% (Ripley, 1996, page 230)

pruneit <- function(formula, ...)
  prune(rpart(formula, ...), cp =0.01)

errorest(Type ~ ., data=Glass, model=pruneit, estimator= "cv",
predict=mypredict.rpart)

data(smoking)
# Set three groups of variables:
# 1) explanatory variables are: TarY, NicY, COY, Sex, Age
# 2) intermediate variables are: TVPS, BPNL, COHB
# 3) response (resp) is defined by:

resp <- function(data){
  res <- t(t(data) > c(4438, 232.5, 58))
  res <- as.factor(ifelse(apply(res, 1, sum) > 2, 1, 0))
  res
}

response <- resp(smoking[ ,c("TVPS", "BPNL", "COHB")])
smoking <- cbind(smoking, response)

formula <- TVPS+BPNL+COHB~TarY+NicY+COY+Sex+Age

mypredict.inclass <- function(object, newdata){
  res <- predict.inclass(object = object, cFUN = resp, newdata = newdata)
  return(res)
}

# Estimation per leave-one-out estimate for the misclassification is 
# 36.36\% (Hand et al., 2001), using indirect classification with 
# linear models

errorest(formula, data = smoking, model = inclass, predict = mypredict.inclass,
         estimator = "cv", iclass = "response", pFUN = lm,
         est.para=list(k=nrow(smoking)))

Run the code above in your browser using DataLab