Run hierarchical clustering following by a group-lasso on all the different partition and a hierarchical testing procedure. Only for linear regression problem.
fullProcess(X, ...)# S3 method for default
fullProcess(
X,
y,
control = c("FWER", "FDR"),
alpha = 0.05,
test = partialFtest,
hc = NULL,
fractionSampleMLGL = 1/2,
BHclust = 50,
nCore = NULL,
addRoot = FALSE,
Shaffer = FALSE,
...
)
# S3 method for formula
fullProcess(
formula,
data,
control = c("FWER", "FDR"),
alpha = 0.05,
test = partialFtest,
hc = NULL,
fractionSampleMLGL = 1/2,
BHclust = 50,
nCore = NULL,
addRoot = FALSE,
Shaffer = FALSE,
...
)
a list containing:
output of MLGL function
lambda values maximizing the number of rejects
A vector containing the index of selected variables for the first lambdaOpt
value
A vector containing the values index of selected groups for the first lambdaOpt
value
Selected groups for the first lambdaOpt
value
Selected groups for all lambda values
Control level
Test used in the testing procedure
"FDR" or "FWER"
Elapsed time
matrix of size n*p
Others parameters for MLGL
vector of size n.
either "FDR" or "FWER"
control level for testing procedure
test used in the testing procedure. Default is partialFtest
output of hclust
function. If not provided, hclust
is run with ward.D2 method. User
can also provide the desired method: "single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid", "median".
a real between 0 and 1: the fraction of individuals to use in the sample for MLGL (see Details).
number of replicates for computing the distance matrix for the hierarchical clustering tree
number of cores used for distance computation. Use all cores by default.
If TRUE, add a common root containing all the groups
If TRUE, a Shaffer correction is performed (only if control = "FWER")
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment (formula)
Quentin Grimonprez
Divide the n individuals in two samples. Then the three following steps are done: 1) Bootstrap Hierarchical Clustering of the variables of X 2) MLGL on the second sample of individuals 3) Hierarchical testing procedure on the first sample of individuals.
MLGL, hierarchicalFDR, hierarchicalFWER, selFDR, selFWER
# least square loss
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
res <- fullProcess(X, y)
Run the code above in your browser using DataLab