IF.lm: Invisible Fence model selection (Linear Model)

Description

Invisible Fence model selection (Linear Model)

Usage

IF.lm(full, data, B = 100, cpus = 2, lftype = c("abscoef", "pvalue"))

Arguments

full

formula of full model

data

number of bootstrap sample, parametric for lm

cpus

number of parallel computers

lftype

subtractive measure type, e.g., absolute value of coefficients, p-value, t-value, etc.

Value

full

list the full model

list the number of bootstrap samples that have been used

freq

list the coverage probabilities of the selected model for each dimension

size

list the number of variables in the parsimonious model

term

list variables included in the full model

model

list the variables selected in-the-order in the parsimonious model

@note The current Invisible Fence focuses on variable selection. The current routine is applicable to the case in which the subtractive measure is the absolute value of the coefficients, p-value, t-value. However, the method can be extended to other subtractive measures. See Jiang et. al (2011) for more details.

Details

This method (Jiang et. al, 2011) is motivated by computational expensive in complex and high dimensional problem. The idea of the method--there is the best model in each dimension (in model space). The boostrapping determines the coverage probability of the selected model in each dimensions. The parsimonious model is the selected model with the highest coverage probabily (except the one for the full model, always probability of 1.)

References

Jiang J., Rao J.S., Gu Z., Nguyen T. (2008), Fence Methods for Mixed Model Selection. The Annals of Statistics, 36(4): 1669-1692
Jiming Jiang, Thuan Nguyen and J. Sunil Rao (2011), Invisible fence methods and the identification of differentially expressed gene sets. Statistics and Its Interface, Volume 4, 403-415.

Examples

Run this code

library(fence)
library(MASS)
library(snow)
r =1234; set.seed(r)
p=10; n=300; rho = 0.6
R = diag(p)
for(i in 1:p){
  for(j in 1:p){
     R[i,j] = rho^(abs(i-j))
  }
}
R = 1*R
x=mvrnorm(n, rep(0, p), R)
colnames(x)=paste('x',1:p, sep='')
X = cbind(rep(1,n),x)
tbetas = c(1,1,1,0,1,1,0,1,0,0,0)  # non-zero beta 1,2,4,5,7
epsilon = rnorm(n)
y = as.matrix(X)%*%tbetas + epsilon
colnames(y) = 'y'
data = data.frame(cbind(X,y))
full = y ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10
# Takes greater than 5 seconds (~`17 seconds) to run
# obj1 = IF.lm(full = full, data = data, B = 100, lftype = "abscoef")
# sort((names(obj1$model$coef)[-1]))  
# obj2 = IF.lm(full = full, data = data, B = 100, lftype = "pvalue")
# sort(setdiff(names(data[c(-1,-12)]), names(obj2$model$coef)))

Run the code above in your browser using DataLab