EM algorithm to estimate LUCID with one or multiple omics layers
estimate_lucid(
lucid_model = c("early", "parallel", "serial"),
G,
Z,
Y,
CoG = NULL,
CoY = NULL,
K,
init_omic.data.model = "EEV",
useY = TRUE,
tol = 0.001,
max_itr = 1000,
max_tot.itr = 10000,
Rho_G = 0,
Rho_Z_Mu = 0,
Rho_Z_Cov = 0,
family = c("normal", "binary"),
seed = 123,
init_impute = c("mix", "lod"),
init_par = c("mclust", "random"),
verbose = FALSE
)
A list contains the object below:
res_Beta: estimation for G->X associations
res_Mu: estimation for the mu of the X->Z associations
res_Sigma: estimation for the sigma of the X->Z associations
res_Gamma: estimation for X->Y associations
inclusion.p: inclusion probability of cluster assignment for each observation
K: umber of latent clusters for "early"/list of numbers of latent clusters for "parallel" and "serial"
var.names: names for the G, Z, Y variables
init_omic.data.model: pre-specified geometric model of multi-omics data
likelihood: converged LUCID model log likelihood
family: the distribution of the outcome
select: for LUCID early integration only, indicators of whether each exposure and omics feature is selected
useY: whether this LUCID model is supervised
Z: multi-omics data
init_impute: pre-specified imputation method
init_par: pre-specified parameter initialization method
Rho: for LUCID early integration only, pre-specified regularity tuning parameter
N: number of observations
submodel: for LUCID in serial only, storing all the submodels
Specifying LUCID model, "early" for early integration, "parallel" for lucid in parallel, "serial" for lucid in serial
an N by P matrix representing exposures
Omics data, if "early", an N by M matrix; If "parallel", a list, each element i is a matrix with N rows and P_i features; If "serial", a list, each element i is a matrix with N rows and p_i features or a list with two or more matrices with N rows and a certain number of features
a length N vector
an N by V matrix representing covariates to be adjusted for G -> X
an N by K matrix representing covariates to be adjusted for X -> Y
Number of latent clusters. If "early", an integer greater or equal to 2; If "parallel",an integer vector, same length as Z, with each element being an interger greater or equal to 2; If "serial", a list, each element is either an integer like that for "early" or an list of integers like that for "parallel", same length as Z
a vector of strings specifies the geometric model of omics data. If NULL, See more in ?mclust::mclustModelNames
logical, if TRUE, EM algorithm fits a supervised LUCID; otherwise unsupervised LUCID.
stopping criterion for the EM algorithm
Maximum iterations of the EM algorithm. If the EM algorithm iterates more than max_itr without converging, the EM algorithm is forced to stop.
Max number of total iterations for estimate_lucid
function.
estimate_lucid
may conduct EM algorithm for multiple times if the algorithm
fails to converge.
A scalar. This parameter is the LASSO penalty to regularize
exposures. If user wants to tune the penalty, use the wrapper
function lucid
. Now only achieved for LUCID early integration.
A scalar. This parameter is the LASSO penalty to
regularize cluster-specific means for omics data (Z). If user wants to tune the
penalty, use the wrapper function lucid
.Now only achieved for LUCID early integration.
A scalar. This parameter is the graphical LASSO
penalty to estimate sparse cluster-specific variance-covariance matrices for omics
data (Z). If user wants to tune the penalty, use the wrapper function lucid
.
Now only achieved for LUCID early integration.
The distribution of the outcome
Random seed to initialize the EM algorithm
Method to initialize the imputation of missing values in
LUCID. mix
will use mclust:imputeData
to implement EM Algorithm
for Unrestricted General Location Model by the mix package to impute the missing values in omics
data; lod
will initialize the imputation via replacing missing values by
LOD / sqrt(2). LOD is determined by the minimum of each variable in omics data.
For "early", an interface to initialize EM algorithm, if mclust,
initiate the parameters using the mclust
package, if random, initiate the parameters
by drawing from a uniform distribution;
For "parallel", mclust is the default for quick convergence;
For "serial", each sub-model follows the above depending on it is a "early" or "parallel"
A flag indicates whether detailed information for each iteration of EM algorithm is printed in console. Default is FALSE.
i <- 1008
set.seed(i)
G <- matrix(rnorm(500), nrow = 100)
Z1 <- matrix(rnorm(1000),nrow = 100)
Z2 <- matrix(rnorm(1000), nrow = 100)
Z3 <- matrix(rnorm(1000), nrow = 100)
Z4 <- matrix(rnorm(1000), nrow = 100)
Z5 <- matrix(rnorm(1000), nrow = 100)
Z <- list(Z1 = Z1, Z2 = Z2, Z3 = Z3, Z4 = Z4, Z5 = Z5)
Y <- rnorm(100)
CoY <- matrix(rnorm(200), nrow = 100)
CoG <- matrix(rnorm(200), nrow = 100)
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y, K = list(2,2,2,2,2),
lucid_model = "serial",
family = "normal",
seed = i,
CoG = CoG, CoY = CoY,
useY = TRUE)
Run the code above in your browser using DataLab