This function implements forward selection of linear models almost identically to step
with direction = "forward"
. The reason this is a separate function from fs
is that groups of variables (e.g. dummies encoding levels of a categorical variable) must be handled differently in the selective inference framework.
groupfs(x, y, index, maxsteps, sigma = NULL, k = 2, intercept = TRUE,
center = TRUE, normalize = TRUE, aicstop = 0, verbose = FALSE)
Matrix of predictors (n by p).
Vector of outcomes (length n).
Group membership indicator of length p. Check that sort(unique(index)) = 1:G
where G
is the number of distinct groups.
Maximum number of steps for forward stepwise.
Estimate of error standard deviation for use in AIC criterion. This determines the relative scale between RSS and the degrees of freedom penalty. Default is NULL corresponding to unknown sigma. When NULL, link{groupfsInf}
performs truncated F inference instead of truncated \(\chi\). See extractAIC
for details on the AIC criterion.
Multiplier of model size penalty, the default is k = 2
for AIC. Use k = log(n)
for BIC, or k = 2log(p)
for RIC (best for high dimensions, when \(p > n\)). If \(G < p\) then RIC may be too restrictive and it would be better to use log(G) < k < 2log(p)
.
Should an intercept be included in the model? Default is TRUE. Does not count as a step.
Should the columns of the design matrix be centered? Default is TRUE.
Should the design matrix be normalized? Default is TRUE.
Early stopping if AIC increases. Default is 0 corresponding to no early stopping. Positive integer values specify the number of times the AIC is allowed to increase in a row, e.g. with aicstop = 2
the algorithm will stop if the AIC criterion increases for 2 steps in a row. The default of step
corresponds to aicstop = 1
.
Print out progress along the way? Default is FALSE.
An object of class "groupfs" containing information about the sequence of models in the forward stepwise algorithm. Call the function groupfsInf
on this object to compute selective p-values.
# NOT RUN {
x = matrix(rnorm(20*40), nrow=20)
index = sort(rep(1:20, 2))
y = rnorm(20) + 2 * x[,1] - x[,4]
fit = groupfs(x, y, index, maxsteps = 5)
out = groupfsInf(fit)
out
# }
Run the code above in your browser using DataLab