big.oem: Orthogonalizing EM for big.matrix objects

Description

Orthogonalizing EM for big.matrix objects

Usage

big.oem(
  x,
  y,
  family = c("gaussian", "binomial"),
  penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net",
    "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net",
    "sparse.grp.lasso"),
  weights = numeric(0),
  lambda = numeric(0),
  nlambda = 100L,
  lambda.min.ratio = NULL,
  alpha = 1,
  gamma = 3,
  tau = 0.5,
  groups = numeric(0),
  penalty.factor = NULL,
  group.weights = NULL,
  standardize = TRUE,
  intercept = TRUE,
  maxit = 500L,
  tol = 1e-07,
  irls.maxit = 100L,
  irls.tol = 0.001,
  compute.loss = FALSE,
  gigs = 4,
  hessian.type = c("full", "upper.bound")
)

Value

An object with S3 class "oem"

Arguments

x

input big.matrix object pointing to design matrix Each row is an observation, each column corresponds to a covariate

y

numeric response vector of length nobs.

family

"gaussian" for least squares problems, "binomial" for binary response. "binomial" currently not available.

penalty

Specification of penalty type. Choices include:

"elastic.net" - elastic net penalty, extra parameters: "alpha"
"lasso" - lasso penalty
"ols" - ordinary least squares
"mcp" - minimax concave penalty, extra parameters: "gamma"
"scad" - smoothly clipped absolute deviation, extra parameters: "gamma"
"mcp.net" - minimax concave penalty + l2 penalty, extra parameters: "gamma", "alpha"
"scad.net" - smoothly clipped absolute deviation + l2 penalty, extra parameters: "gamma", "alpha"
"grp.lasso" - group lasso penalty
"grp.lasso.net" - group lasso penalty + l2 penalty, extra parameters: "alpha"
"grp.mcp" - group minimax concave penalty, extra parameters: "gamma"
"grp.scad" - group smoothly clipped absolute deviation, extra parameters: "gamma"
"grp.mcp.net" - group minimax concave penalty + l2 penalty, extra parameters: "gamma", "alpha"
"grp.scad.net" - group smoothly clipped absolute deviation + l2 penalty, extra parameters: "gamma", "alpha"
"sparse.grp.lasso" - sparse group lasso penalty (group lasso + lasso), extra parameters: "tau"

Careful consideration is required for the group lasso, group MCP, and group SCAD penalties. Groups as specified by the groups argument should be chosen in a sensible manner.

weights

observation weights. Not implemented yet. Defaults to 1 for each observation (setting weight vector to length 0 will default all weights to 1)

lambda

A user supplied lambda sequence. By default, the program computes its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this.

nlambda

The number of lambda values - default is 100.

lambda.min.ratio

Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01. A very small value of lambda.min.ratio will lead to a saturated fit when nobs < nvars.

alpha

mixing value for elastic.net, mcp.net, scad.net, grp.mcp.net, grp.scad.net. penalty applied is (1 - alpha) * (ridge penalty) + alpha * (lasso/mcp/mcp/grp.lasso penalty)

gamma

tuning parameter for SCAD and MCP penalties. must be >= 1

tau

mixing value for sparse.grp.lasso. penalty applied is (1 - tau) * (group lasso penalty) + tau * (lasso penalty)

groups

A vector of describing the grouping of the coefficients. See the example below. All unpenalized variables should be put in group 0

penalty.factor

Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables.

group.weights

penalty factors applied to each group for the group lasso. Similar to penalty.factor, this is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some groups, which implies no shrinkage, and that group is always included in the model. Default is sqrt(group size) for all groups.

standardize

Logical flag for x variable standardization, prior to fitting the models. The coefficients are always returned on the original scale. Default is standardize = TRUE. If variables are in the same units already, you might not wish to standardize. Keep in mind that standardization is done differently for sparse matrices, so results (when standardized) may be slightly different for a sparse matrix object and a dense matrix object

intercept

Should intercept(s) be fitted (default = TRUE) or set to zero (FALSE)

maxit

integer. Maximum number of OEM iterations

tol

convergence tolerance for OEM iterations

irls.maxit

integer. Maximum number of IRLS iterations

irls.tol

convergence tolerance for IRLS iterations. Only used if family != "gaussian"

compute.loss

should the loss be computed for each estimated tuning parameter? Defaults to FALSE. Setting to TRUE will dramatically increase computational time

gigs

maximum number of gigs of memory available. Used to figure out how to break up calculations involving the design matrix x

hessian.type

only for logistic regression. if hessian.type = "full", then the full hessian is used. If hessian.type = "upper.bound", then an upper bound of the hessian is used. The upper bound can be dramatically faster in certain situations, ie when n >> p

References

Huling. J.D. and Chien, P. (2022), Fast Penalized Regression and Cross Validation for Tall Data with the oem Package. Journal of Statistical Software 104(6), 1-24. doi:10.18637/jss.v104.i06

Examples

Run this code

if (FALSE) {
set.seed(123)
nrows <- 50000
ncols <- 100
bkFile <- "bigmat.bk"
descFile <- "bigmatk.desc"
bigmat <- filebacked.big.matrix(nrow=nrows, ncol=ncols, type="double",
                                backingfile=bkFile, backingpath=".",
                                descriptorfile=descFile,
                                dimnames=c(NULL,NULL))

# Each column value with be the column number multiplied by
# samples from a standard normal distribution.
set.seed(123)
for (i in 1:ncols) bigmat[,i] = rnorm(nrows)*i

y <- rnorm(nrows) + bigmat[,1] - bigmat[,2]

fit <- big.oem(x = bigmat, y = y, 
               penalty = c("lasso", "elastic.net", 
                           "ols", 
                           "mcp",       "scad", 
                           "mcp.net",   "scad.net",
                           "grp.lasso", "grp.lasso.net",
                           "grp.mcp",   "grp.scad",
                           "sparse.grp.lasso"), 
               groups = rep(1:20, each = 5))
               
fit2 <- oem(x = bigmat[,], y = y, 
            penalty = c("lasso", "grp.lasso"), 
            groups = rep(1:20, each = 5))   
           
max(abs(fit$beta[[1]] - fit2$beta[[1]]))            

layout(matrix(1:2, ncol = 2))
plot(fit)
plot(fit, which.model = 2)
}

Run the code above in your browser using DataLab