gimmeSEM: Group iterative multiple model estimation.

Description

This function identifies structural equation models for each individual that consist of both group-level and individual-level paths.

Usage

gimmeSEM(data        = NULL,
         out         = NULL,
         sep         = NULL,
         header      = NULL,
         ar          = TRUE,
         plot        = TRUE,
         subgroup    = FALSE,
         sub_feature = "lag & contemp",
         sub_method = "Walktrap",
         sub_sim_thresh    = "lowest", 
         confirm_subgroup = NULL,
         paths       = NULL,
         exogenous = NULL,
         outcome   = NULL,
         conv_vars   = NULL,
         conv_length = 16, 
         conv_interval = 1,
         mult_vars   = NULL,
         mean_center_mult = FALSE,
         standardize = FALSE,
         groupcutoff = .75,
         subcutoff   = .75,
         diagnos     = FALSE, 
         ms_allow         = FALSE,
         ms_tol           = 1e-5,
         lv_model         = NULL, 
         lv_estimator     = "miiv",     
         lv_scores        = "regression",       
         lv_miiv_scaling  = "first.indicator", 
         lv_final_estimator = "miiv",
         lasso_model_crit    = NULL, 
         hybrid = FALSE,
         VAR = FALSE,
         dir_prop_cutoff =0,
         ordered = NULL)

Arguments

data: The path to the directory where the data files are located, or the name of the list containing each individual's time series. Each file or matrix must contain one matrix for each individual containing a T (time) by p (number of variables) matrix where the columns represent variables and the rows represent time. Individuals must have the same variables (p) but can have different lengths of observations (T).
out: The path to the directory where the results will be stored (optional). If specified, a copy of output files will be replaced in directory. If directory at specified path does not exist, it will be created.
sep: The spacing of the data files. Follows R convention. "" indicates space-delimited, backslash "t" indicates tab-delimited, "," indicates comma delimited. Only necessary to specify if reading data in from physical directory.
header: Logical. Indicate TRUE for data files with a header. Only necessary to specify if reading data in from physical directory.
ar: Logical. If TRUE, begins search for group model with autoregressive (AR) paths freed for estimation. If ms_allow=TRUE, it is recommended to set ar=FALSE. Multiple solutions are unlikely to be found when ar=TRUE. Defaults to TRUE.
plot: Logical. If TRUE, graphs depicting relations among variables of interest will automatically be created. Solid lines represent contemporaneous relations (lag 0) and dashed lines reflect lagged relations (lag 1). For individual-level plots, red paths represent positive weights and blue paths represent negative weights. Width of paths corresponds to estimated path weight. For the group-level plot, black represents group-level paths, grey represents individual-level paths, and (if subgroup = TRUE) green represents subgroup-level paths. For the group-level plot, the width of the edge corresponds to the count. Defaults to TRUE.
subgroup: Logical. If TRUE, subgroups are generated based on similarities in model features using the walktrap.community function from the igraph package. When ms_allow=TRUE, subgroup should be set to FALSE. Defaults to FALSE.
sub_feature: Option to indicate feature(s) used to subgroup individuals. Defaults to "lag & contemp" for lagged and contemporaneous, which is the original method. Can use "lagged" or "contemp" to subgroup solely on features related to lagged and contemporaneous relations, respectively.
sub_method: Community detection method used to cluster individuals into subgroups. Options align with those available in the igraph package: "Walktrap" (default), "Infomap", "Louvain", "Edge Betweenness", "Label Prop", "Fast Greedy", "Leading Eigen", and "Spinglass".
sub_sim_thresh: Threshold for inducing sparsity in similarity matrix. Options are: the percent of edges in the similarity matrix to set to zero (e.g., .25 would set the lower quartile), "lowest" (default) subtracts the minimum value from all values, and "search" searches across thresholds to arrive at one providing highest modularity.
confirm_subgroup: Dataframe. Option only available when subgroup = TRUE. Dataframe should contain two columns. The first column should specify file labels (the name of the data files without file extension), and the second should contain integer values (beginning at 1) specifying the subgroup membership for each individual. function from the igraph package. Defaults to TRUE.
paths: lavaan-style syntax containing paths with which to begin model estimation (optional). That is, Y~X indicates that Y is regressed on X, or X predicts Y. Paths can also be set to a specific value for estimation using lavaan-style syntax (e.g., 'V4 ~ 0.5*V3'), or set to 0 so that they will not be estimated (e.g., 'V4 ~ 0*V3'). If no header is used, then variables should be referred to with V followed (with no separation) by the column number. If a header is used, variables should be referred to using variable names. To reference lag variables, "lag" should be added to the end of the variable name with no separation. Defaults to NULL.
exogenous: Vector of variable names to be treated as exogenous (optional). That is, exogenous variable X can predict Y but cannot be predicted by Y. If no header is used, then variables should be referred to with V followed (with no separation) by the column number. If a header is used, variables should be referred to using variable names. The default for exogenous variables is that lagged effects of the exogenous variables are not included in the model search. If lagged paths are wanted, "&lag" should be added to the end of the variable name with no separation. Defaults to NULL.
outcome: Vector of variable names to be treated as outcome (optional). This is a variable that can be predicted by others but cannot predict. If no header is used, then variables should be referred to with V followed (with no separation) by the column number. If a header is used, variables should be referred to using variable names.
conv_vars: Vector of variable names to be convolved via smoothed Finite Impulse Response (sFIR). Note, conv_vars are not not automatically considered exogenous variables. To treat conv_vars as exogenous use the exogenous argument. Variables listed in conv_vars must be binary variables. You cannot do lagged variables. If there is missing data in the endogenous variables their values will be imputed for the convolution operation only. Defaults to NULL.
conv_length: Expected response length in seconds. For functional MRI BOLD, 16 seconds (default) is typical for the hemodynamic response function.
conv_interval: Interval between data acquisition. Currently conv_length/conv_interval must be an integer. For fMRI studies, this is the repetition time. Defaults to 1.
mult_vars: Vector of variable names to be multiplied to explore bilinear/modulatory effects (optional). All multiplied variables will be treated as exogenous (X can predict Y but cannot be predicted by Y). Within the vector, multiplication of two variables should be indicated with an asterik (e.g. V1*V2). If no header is used, variables should be referred to with V followed by the column number (with no separation). If a header is used, each variable should be referred to using variable names. If multiplication with the lag 1 of a variable is desired, the variable name should be followed by "lag" with no separation (e.g. V1*V2lag).
mean_center_mult: Logical. If TRUE, the variables indicated in mult_vars will be mean-centered before being multiplied together. Defaults to FALSE.
standardize: Logical. If TRUE, all variables will be standardized to have a mean of zero and a standard deviation of one. Defaults to FALSE
groupcutoff: Cutoff value for group-level paths. Defaults to .75, indicating that a path must be significant across 75% of individuals to be included as a group-level path.
subcutoff: Cutoff value for subgroup- level paths. Defaults to .75, indicating that a path must be significant across at least 75% of the individuals in a subgroup to be considered a subgroup-level path.
diagnos: Logical. If TRUE provides internal output for diagnostic purposes. Defaults to FALSE.
ms_allow: Logical. If TRUE provides multiple solutions when more than one path has identical modification index values. When ms_allow=TRUE, it is recommended to set ar=FALSE. Multiple solutions are unlikely to be found when ar=TRUE. Additionally, subgroup should be set to FALSE. Output files for individuals with multiple solutions will represent the last solution found for the individual, not necessarily the best solution for the individual.
ms_tol: Precision used when evaluating similarity of modification indices when ms_allow = TRUE. We recommend that ms_tol not be greater than the default, especially when standardize=TRUE. Defaults to 1e-5.
lv_model: Invoke latent variable modeling by providing the measurement model syntax here. lavaan conventions are used for relating observed variables to factors. Defaults to NULL.
lv_estimator: Estimator used for factor analysis. Options are "miiv" (default), "pml" (pseudo-ML) or "svd".
lv_scores: Method used for estimating latent variable scores from parameters obtained from the factor analysis when lv_model is not NULL. Options are: "regression" (Default), "bartlett".
lv_miiv_scaling: Type of scaling indicator to use when "miiv" selected for lv_estimator. Options are "first.indicator" (Default; the first observed variable in the measurement equation is used), "group" (best one for the group), or "individual" (each individual has the best one for them according to R2).
lv_final_estimator: Estimator for final estimations. "miiv" (Default) or "pml" (pseudo-ML).
lasso_model_crit: When not null, invokes multiLASSO approach for the GIMME model search procedure. Arguments indicate the model selection criterion to use for model selection: 'bic' (select on BIC), 'aic', 'aicc', 'hqc', 'cv' (cross-validation).
hybrid: Logical. If TRUE, enables hybrid-VAR models where both directed contemporaneous paths and contemporaneous covariances among residuals are candidate relations in the search space. Defaults to FALSE.
VAR: Logical. If true, VAR models where contemporaneous covariances among residuals are candidate relations in the search space. Defaults to FALSE.
dir_prop_cutoff: Option to require that the directionality of a relation has to be higher than the reverse direction for a prespecified proportion of indivdiuals.
ordered: A character vector containing the names of all ordered categorical variables in the model.

Author

Zachary Fisher, Kathleen Gates, & Stephanie Lane

Details

Output is a list of results if saved as an object and/or files printed to a directory if the "out" argument is used.

References

Gates, K.M. & Molenaar, P.C.M. (2012). Group search algorithm recovers effective connectivity maps for individuals in homogeneous and heterogeneous samples. NeuroImage, 63, 310-319.

Lane, S.T. & Gates, K.M. (2017). Automated selection of robust individual-level structural equation models for time series data. Structural Equation Modeling.

Adriene M. Beltz & Peter C. M. Molenaar (2016) Dealing with Multiple Solutions in Structural Vector Autoregressive Models, Multivariate Behavioral Research, 51:2-3, 357-373.

Examples

Run this code

 if (FALSE) {
paths <- 'V2 ~ V1
          V3 ~ V4lag'

fit <- gimmeSEM(data     = simData,
                out      = "C:/simData_out",
                subgroup = TRUE, 
                paths    = paths)

print(fit, mean = TRUE)
print(fit, subgroup = 1, mean = TRUE)
print(fit, file = "group_1_1", estimates = TRUE)
print(fit, subgroup = 2, fitMeasures = TRUE)
plot(fit, file = "group_1_1")
 }

Run the code above in your browser using DataLab