shrink.heur: Shrinkage After Estimation Using Heuristic Formulae

Description

Shrink regression coefficients using heuristic formulae, first described by Van Houwelingen and Le Cessie (Stat. Med., 1990)

Usage

shrink.heur(dataset, model, DF, int = TRUE, int.adj = FALSE)

Arguments

dataset

a dataset for regression analysis. Data should be in the form of a matrix, with the outcome variable as the final column. Application of the datashape function beforehand is recommended, especially if categorical predictors are present. For regression with an intercept included a column vector of 1s should be included before the dataset (see examples)

model

type of regression model. Either "linear" or "logistic".

the number of degrees of freedom or number of predictors in the model. If DF is missing the value will be automatically estimated. This may be inaccurate for complex models with non-linear terms.

int

logical. If TRUE the model will include a regression intercept.

int.adj

logical. If TRUE the regression intercept will be re-estimated after shrinkage of the regression coefficients. If FALSE the regression intercept will be re-estimated as described by Harrell 2001.

Value

raw.coeff: the raw regression model coefficients, pre-shrinkage.
shrunk.coeff: the shrunken regression model coefficients
lambda: the heuristic shrinkage factor
DF: the number of degrees of freedom or number of predictors in the model

Details

This function can be used to estimate shrunken regression coefficients based on heuristic formulae (see References). A linear or logistic regression model with an intercept is fitted to the data, and a shrinkage factor is estimated. The shrinkage factor is then applied to the regression coefficients. If int.adj == FALSE the intercept value is estimated as described in Harrell 2001.If int.adj == TRUE the intercept value will be re-estimated by refitting the model with the shrunken coefficients.

The heuristic formula work by applying the number of model degrees of freedom (or the number of predictors) as a penalty, and so as the model becomes more complex, the necessary shrinkage increases, and the shrinkage factor becomes closer to zero.

References

Harrell, F. E. "Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis." Springer, (2001).

Harrell, F. E., Kerry L. Lee, and Daniel B. Mark. "Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors." Statistics in medicine (1996) 15:361-387.

Steyerberg, E. "Clinical Prediction Models" Springer (2009)

Van Houwelingen, J. C. and Le Cessie, S., "Predictive value of statistical models." Statistics in medicine (1990) 9:1303:1325.

Examples

Run this code

## Example 1: Linear regression using the iris dataset
## shrinkage using a heuristic formula
data(iris)
iris.data <- as.matrix(iris[, 1:4])
iris.data <- cbind(1, iris.data)
set.seed(123)
shrink.heur(dataset = iris.data, model = "linear")

## Example 2: logistic regression using a subset of the mtcars data
## shrinkage using a heuristic formula
data(mtcars)
mtc.data <- cbind(1,datashape(mtcars, y = 8, x = c(1,6,9)))
head(mtc.data)
set.seed(321)
shrink.heur(dataset = mtc.data, model = "logistic", DF = 3,
int = TRUE, int.adj = TRUE)

Run the code above in your browser using DataLab