Learn R Programming

BGData (version 2.4.1)

FWD: Performs Forward Regressions

Description

Performs forward regression of y on the columns of X. Predictors are added, one at a time, each time adding the one that produces the largest reduction in the residual sum of squares (RSS). The function returns estimates and summaries for the entire forward search. This function performs a similar search than that of step(, direction='forward'), however, FWD() is optimized for computational speed for linear models with very large sample size. To achieve fast computations, the software first computes the sufficient statistics X'X and X'y. At each step, the function first finds the predictor that produces the largest reduction in the sum of squares (this can be derived from X'X, X'y and the current solution of effects), and then updates the estimates of effects for the resulting model using Gauss Seidel iterations performed on the linear system (X'X)b=X'y, iterating only over the elements of b that are active in the model.

Usage

FWD(y, X, df = 20, tol = 1e-7, maxIter = 1000, centerImpute = TRUE,
    verbose = TRUE)

Value

A list with two entries:

  • B: (pxdf+1) includes the estimated effects for each predictor (rows) at each step of the forward search (df, in columns).

  • path: A data frame providing the order in which variables were added to the model (variable) and statistics for each step of the forward search (RSS, LogLik, VARE (the residual variance), DF, AIC, and BIC).

Arguments

y

The response vector (numeric nx1).

X

An (nxp) numeric matrix. Columns are the features (aka predictors) considered in the forward search. The rows of X must be matched to the entries of y.

df

Defines the maximum number of predictors to be included in the model. For complete forward search, set df = ncol(X).

tol

A tolerance parameter to control when to stop the Gauss Seidel algorithm.

maxIter

The maximum number of iterations for the Gauss Seidel algorithm (only used when the algorithm is not stopped by the tolerance parameter).

centerImpute

Whether to center the columns of X and impute the missing values with the column means.

verbose

Use verbose = TRUE to print summaries of the forward search.