randomGLMpredictor(
x, y, xtest = NULL,
classify = TRUE,
nBags = 100,
replace = TRUE,
nObsInBag = if (replace) nrow(x) else as.integer(0.632 * nrow(x)),
nFeaturesInBag = ceiling(ifelse(ncol(x)
x
.TRUE
) or as a continuous variable (FALSE
)?x
)."robustY=F"
.y
based on out-of-bag samples. In case of a continous outcome, this is the predicted value based on out-of-bag samples.y
for test data for binary outcomes. In case of a continous outcome, this is the test set predicted value.nBags
rows and nObsInBag
columns, giving the indices of observations selected for each bag.nBags
rows and columns corresponding to features, indicating which features are selected as candidate regression covariates in each bag.nBags
rows and columns corresponding to features, indicating which features/covariates are selected into the final regression model in each bag.nBags
rows and columns corresponding to features, giving the final generalized linear model coefficients for features in each bag.randomGLMpredictor
function requires the R package MASS
since it makes use of the function stepAIC
. Basically, randomGLMpredictor
first selects bootstrapping samples and features randomly for each bag, and then restricts the analysis to features that are highly correlated with the outcome. Prediction in each bag is made based on forward stepwise regression (logistic for binary outcomes, linear for quantitative outcomes). An overall prediction is obtained by averaging results from all bags. Generally, nCandidateCovariates
>100 is not recommended, because the forward selection process is time-consuming. If "nBags=1, replace=F, nObsInBag=nrow(x)"
is used, the function becomes a stepwise generalized linear model predictor without bagging.## binary outcome prediction
# data generation
data(iris)
iris=iris[1:100,]
iris$Species = as.factor(as.character(iris$Species))
set.seed(1)
indx=sample(100, 67, replace=FALSE)
alldat1=iris[indx, ]
alldat2=iris[-indx,]
dat1=alldat1[,-5]
y1=alldat1[,5]
dat2=alldat2[,-5]
y2=alldat2[,5]
# predict with a small number of bags - normally nBags should be at least 100.
RGLM = randomGLMpredictor(dat1, y1, dat2, nCandidateCovariates=ncol(dat1), nBags=30)
y2predict = RGLM$predictedTest
table(y2predict, y2)
Run the code above in your browser using DataLab