fullProcess: Full process of MLGL

Description

Run hierarchical clustering following by a group-lasso on all the different partition and a hierarchical testing procedure. Only for linear regression problem.

Usage

fullProcess(X, ...)
# S3 method for default
fullProcess(
  X,
  y,
  control = c("FWER", "FDR"),
  alpha = 0.05,
  test = partialFtest,
  hc = NULL,
  fractionSampleMLGL = 1/2,
  BHclust = 50,
  nCore = NULL,
  addRoot = FALSE,
  Shaffer = FALSE,
  ...
)
# S3 method for formula
fullProcess(
  formula,
  data,
  control = c("FWER", "FDR"),
  alpha = 0.05,
  test = partialFtest,
  hc = NULL,
  fractionSampleMLGL = 1/2,
  BHclust = 50,
  nCore = NULL,
  addRoot = FALSE,
  Shaffer = FALSE,
  ...
)

Value

a list containing:

res: output of MLGL function
lambdaOpt: lambda values maximizing the number of rejects
var: A vector containing the index of selected variables for the first lambdaOpt value
group: A vector containing the values index of selected groups for the first lambdaOpt value
selectedGroups: Selected groups for the first lambdaOpt value
reject: Selected groups for all lambda values
alpha: Control level
test: Test used in the testing procedure
control: "FDR" or "FWER"
time: Elapsed time

Arguments

X: matrix of size n*p
...: Others parameters for MLGL
y: vector of size n.
control: either "FDR" or "FWER"
alpha: control level for testing procedure
test: test used in the testing procedure. Default is partialFtest
hc: output of hclust function. If not provided, hclust is run with ward.D2 method. User can also provide the desired method: "single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid", "median".
fractionSampleMLGL: a real between 0 and 1: the fraction of individuals to use in the sample for MLGL (see Details).
BHclust: number of replicates for computing the distance matrix for the hierarchical clustering tree
nCore: number of cores used for distance computation. Use all cores by default.
addRoot: If TRUE, add a common root containing all the groups
Shaffer: If TRUE, a Shaffer correction is performed (only if control = "FWER")
formula: an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
data: an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment (formula)

Author

Quentin Grimonprez

Details

Divide the n individuals in two samples. Then the three following steps are done: 1) Bootstrap Hierarchical Clustering of the variables of X 2) MLGL on the second sample of individuals 3) Hierarchical testing procedure on the first sample of individuals.

Examples

Run this code

# least square loss
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
res <- fullProcess(X, y)

Run the code above in your browser using DataLab