estimate_lucid: Fit LUCID models with one or multiple omics layers

Description

EM algorithm to estimate LUCID with one or multiple omics layers

Usage

estimate_lucid(
  lucid_model = c("early", "parallel", "serial"),
  G,
  Z,
  Y,
  CoG = NULL,
  CoY = NULL,
  K,
  init_omic.data.model = "EEV",
  useY = TRUE,
  tol = 0.001,
  max_itr = 1000,
  max_tot.itr = 10000,
  Rho_G = 0,
  Rho_Z_Mu = 0,
  Rho_Z_Cov = 0,
  family = c("normal", "binary"),
  seed = 123,
  init_impute = c("mix", "lod"),
  init_par = c("mclust", "random"),
  verbose = FALSE
)

Value

A list contains the object below:

res_Beta: estimation for G->X associations
res_Mu: estimation for the mu of the X->Z associations
res_Sigma: estimation for the sigma of the X->Z associations
res_Gamma: estimation for X->Y associations
inclusion.p: inclusion probability of cluster assignment for each observation
K: umber of latent clusters for "early"/list of numbers of latent clusters for "parallel" and "serial"
var.names: names for the G, Z, Y variables
init_omic.data.model: pre-specified geometric model of multi-omics data
likelihood: converged LUCID model log likelihood
family: the distribution of the outcome
select: for LUCID early integration only, indicators of whether each exposure and omics feature is selected
useY: whether this LUCID model is supervised
Z: multi-omics data
init_impute: pre-specified imputation method
init_par: pre-specified parameter initialization method
Rho: for LUCID early integration only, pre-specified regularity tuning parameter
N: number of observations
submodel: for LUCID in serial only, storing all the submodels

Arguments

lucid_model: Specifying LUCID model, "early" for early integration, "parallel" for lucid in parallel, "serial" for lucid in serial
G: an N by P matrix representing exposures
Z: Omics data, if "early", an N by M matrix; If "parallel", a list, each element i is a matrix with N rows and P_i features; If "serial", a list, each element i is a matrix with N rows and p_i features or a list with two or more matrices with N rows and a certain number of features
Y: a length N vector
CoG: an N by V matrix representing covariates to be adjusted for G -> X
CoY: an N by K matrix representing covariates to be adjusted for X -> Y
K: Number of latent clusters. If "early", an integer greater or equal to 2; If "parallel",an integer vector, same length as Z, with each element being an interger greater or equal to 2; If "serial", a list, each element is either an integer like that for "early" or an list of integers like that for "parallel", same length as Z
init_omic.data.model: a vector of strings specifies the geometric model of omics data. If NULL, See more in ?mclust::mclustModelNames
useY: logical, if TRUE, EM algorithm fits a supervised LUCID; otherwise unsupervised LUCID.
tol: stopping criterion for the EM algorithm
max_itr: Maximum iterations of the EM algorithm. If the EM algorithm iterates more than max_itr without converging, the EM algorithm is forced to stop.
max_tot.itr: Max number of total iterations for estimate_lucid function. estimate_lucid may conduct EM algorithm for multiple times if the algorithm fails to converge.
Rho_G: A scalar. This parameter is the LASSO penalty to regularize exposures. If user wants to tune the penalty, use the wrapper function lucid. Now only achieved for LUCID early integration.
Rho_Z_Mu: A scalar. This parameter is the LASSO penalty to regularize cluster-specific means for omics data (Z). If user wants to tune the penalty, use the wrapper function lucid.Now only achieved for LUCID early integration.
Rho_Z_Cov: A scalar. This parameter is the graphical LASSO penalty to estimate sparse cluster-specific variance-covariance matrices for omics data (Z). If user wants to tune the penalty, use the wrapper function lucid. Now only achieved for LUCID early integration.
family: The distribution of the outcome
seed: Random seed to initialize the EM algorithm
init_impute: Method to initialize the imputation of missing values in LUCID. mix will use mclust:imputeData to implement EM Algorithm for Unrestricted General Location Model by the mix package to impute the missing values in omics data; lod will initialize the imputation via replacing missing values by LOD / sqrt(2). LOD is determined by the minimum of each variable in omics data.
init_par: For "early", an interface to initialize EM algorithm, if mclust, initiate the parameters using the mclust package, if random, initiate the parameters by drawing from a uniform distribution; For "parallel", mclust is the default for quick convergence; For "serial", each sub-model follows the above depending on it is a "early" or "parallel"
verbose: A flag indicates whether detailed information for each iteration of EM algorithm is printed in console. Default is FALSE.

Examples

Run this code

i <- 1008
set.seed(i)
G <- matrix(rnorm(500), nrow = 100)
Z1 <- matrix(rnorm(1000),nrow = 100)
Z2 <- matrix(rnorm(1000), nrow = 100)
Z3 <- matrix(rnorm(1000), nrow = 100)
Z4 <- matrix(rnorm(1000), nrow = 100)
Z5 <- matrix(rnorm(1000), nrow = 100)
Z <- list(Z1 = Z1, Z2 = Z2, Z3 = Z3, Z4 = Z4, Z5 = Z5)
Y <- rnorm(100)
CoY <- matrix(rnorm(200), nrow = 100)
CoG <- matrix(rnorm(200), nrow = 100)
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y, K = list(2,2,2,2,2),
lucid_model = "serial",
family = "normal",
seed = i,
CoG = CoG, CoY = CoY,
useY = TRUE)

Run the code above in your browser using DataLab