Learn R Programming

rms (version 4.1-3)

predab.resample: Predictive Ability using Resampling

Description

predab.resample is a general-purpose function that is used by functions for specific models. It computes estimates of optimism of, and bias-corrected estimates of a vector of indexes of predictive accuracy, for a model with a specified design matrix, with or without fast backward step-down of predictors. If bw=TRUE, the design matrix x must have been created by ols, lrm, or cph. If bw=TRUE, predab.resample stores as the kept attribute a logical matrix encoding which factors were selected at each repetition.

Usage

predab.resample(fit.orig, fit, measure, 
                method=c("boot","crossvalidation",".632","randomization"),
                bw=FALSE, B=50, pr=FALSE, prmodsel=TRUE,
                rule="aic", type="residual", sls=.05, aics=0,
                tol=1e-12, force=NULL, estimates=TRUE,
                non.slopes.in.x=TRUE, kint=1,
                cluster, subset, group=NULL,
                allow.varying.intercepts=FALSE, debug=FALSE, ...)

Arguments

fit.orig
object containing the original full-sample fit, with the x=TRUE and y=TRUE options specified to the model fitting function. This model should be the FULL model including all candidate variables ever excluded because of poor asso
fit
a function to fit the model, either the original model fit, or a fit in a sample. fit has as arguments x,y, iter, penalty, penalty.matrix, xcol, and other arguments passed to <
measure
a function to compute a vector of indexes of predictive accuracy for a given fit. For method=".632" or method="crossval", it will make the most sense for measure to compute only indexes that are independent of sample size. The me
method
The default is "boot" for ordinary bootstrapping (Efron, 1983, Eq. 2.10). Use ".632" for Efron's .632 method (Efron, 1983, Section 6 and Eq. 6.10), "crossvalidation" for grouped cross--validation,
bw
Set to TRUE to do fast backward step-down for each training sample. Default is FALSE.
B
Number of repetitions, default=50. For method="crossvalidation", this is also the number of groups the original sample is split into.
pr
TRUE to print results for each sample. Default is FALSE.
prmodsel
set to FALSE to suppress printing of model selection output such as that from fastbw.
rule
Stopping rule for fastbw, "aic" or "p". Default is "aic" to use Akaike's information criterion.
type
Type of statistic to use in stopping rule for fastbw, "residual" (the default) or "individual".
sls
Significance level for stopping in fastbw if rule="p". Default is .05.
aics
Stopping criteria for rule="aic". Stops deleting factors when chi-square - 2 times d.f. falls below aics. Default is 0.
tol
Tolerance for singularity checking. Is passed to fit and fastbw.
force
see fastbw
estimates
non.slopes.in.x
set to FALSE if the design matrix x does not have columns for intercepts and these columns are needed
kint
For multiple intercept models such as the ordinal logistic model, you may specify which intercept to use as kint. This affects the linear predictor that is passed to measure.
cluster
Vector containing cluster identifiers. This can be specified only if method="boot". If it is present, the bootstrap is done using sampling with replacement from the clusters rather than from the original records. If this vector is not the s
subset
specify a vector of positive or negative integers or a logical vector when you want to have the measure function compute measures of accuracy on a subset of the data. The whole dataset is still used for all model development. For example, yo
group
a grouping variable used to stratify the sample upon bootstrapping. This allows one to handle k-sample problems, i.e., each bootstrap sample will be forced to selected the same number of observations from each level of group as the number appearing in the
allow.varying.intercepts
set to TRUE to not throw an error if the number of intercepts varies from fit to fit
debug
set to TRUE to print subscripts of all training and test samples
...
The user may add other arguments here that are passed to fit and measure.

Value

  • a matrix of class "validate" with rows corresponding to indexes computed by measure, and the following columns:
  • index.origindexes in original overall fit
  • trainingaverage indexes in training samples
  • testaverage indexes in test samples
  • optimismaverage training-test except for method=".632" - is .632 times (index.orig - test)
  • index.correctedindex.orig-optimism
  • nnumber of successful repetitions with the given index non-missing
  • . Also contains an attribute keepinfo if measure returned such an attribute when run on the original fit.

concept

  • model validation
  • bootstrap
  • predictive accuracy

Details

For method=".632", the program stops with an error if every observation is not omitted at least once from a bootstrap sample. Efron's ".632" method was developed for measures that are formulated in terms on per-observation contributions. In general, error measures (e.g., ROC areas) cannot be written in this way, so this function uses a heuristic extension to Efron's formulation in which it is assumed that the average error measure omitting the ith observation is the same as the average error measure omitting any other observation. Then weights are derived for each bootstrap repetition and weighted averages over the B repetitions can easily be computed.

References

Efron B, Tibshirani R (1997). Improvements on cross-validation: The .632+ bootstrap method. JASA 92:548--560.

See Also

rms, validate, fastbw, lrm, ols, cph, bootcov, setPb

Examples

Run this code
# See the code for validate.ols for an example of the use of
# predab.resample

Run the code above in your browser using DataLab