UVA: Unique Variable Analysis

Description

Identifies redundant variables in a multivariate dataset using a number of different association methods and types of significance values (see Christensen, Garrido, & Golino, 2020 for more details)

Usage

UVA(
  data,
  n = NULL,
  model = c("glasso", "TMFG"),
  corr = c("cor_auto", "pearson", "spearman"),
  method = c("cor", "pcor", "wTO"),
  type = c("adapt", "alpha", "threshold"),
  sig,
  key = NULL,
  reduce = TRUE,
  auto = TRUE,
  label.latent = FALSE,
  reduce.method = c("latent", "remove", "sum"),
  lavaan.args = list(),
  adhoc = TRUE,
  plot.redundancy = FALSE,
  plot.args = list()
)

Value

Returns a list:

redundancy

A list containing several objects:

redudant Vectors nested within the list corresponding to redundant nodes with the name of object in the list
data Original data
correlation Correlation matrix of original data
weights Weights determine by weighted topological overlap, partial correlation, or zero-order correlation
network If method = "wTO", then the network computed following EGA with EBICglasso network estimation
plot If redundancy.plot = TRUE, then a plot of all redundancies found
descriptives
- basic A vector containing the mean, standard deviation, median, median absolute deviation (MAD), 3 times the MAD, 6 times the MAD, minimum, maximum, and critical value for the overlap measure (i.e., weighted topological overlap, partial correlation, or threshold)
- centralTendency A matrix for all (absolute) non-zero values and their respective standard deviation from the mean and median absolute deviation from the median
method Returns method argument
type Returns type argument
distribution If type != "threshold", then distribution that was used to determine significance

reduced

If reduce = TRUE, then a list containing:

data New data with redundant variables merged or removed
mergedA matrix containing the variables that were decided to be redundant with one another
methodMethod used to perform redundancy reduction

adhoc

If adhoc = TRUE, then the adhoc check containing the same objects as in the redundancy list object in the output

Arguments

data

Matrix or data frame. Input can either be data or a correlation matrix

n

Numeric. If input in data is a correlation matrix, then sample size is required. Defaults to NULL

model

Character. A string indicating the method to use. Current options are:

glasso Estimates the Gaussian graphical model using graphical LASSO with extended Bayesian information criterion to select optimal regularization parameter. This is the default method
TMFG Estimates a Triangulated Maximally Filtered Graph

corr

Type of correlation matrix to compute. The default uses cor_auto. Current options are:

cor_auto Computes the correlation matrix using the cor_auto function from qgraph.
pearson Computes Pearson's correlation coefficient using the pairwise complete observations via the cor function.
spearman Computes Spearman's correlation coefficient using the pairwise complete observations via the cor function.

method

Character. Computes weighted topological overlap ("wTO" using EBICglasso), partial correlations ("pcor"), or correlations ("cor") Defaults to "wTO"

type

Character. Type of significance. Computes significance using the standard p-value ("alpha"), adaptive alpha p-value (adapt.a), or some threshold "threshold". Defaults to "threshold"

sig

Numeric. p-value for significance of overlap (defaults to .05). Defaults for "threshold" for each method:

"wTO" .25
"pcor" .35
"cor" .50

key

Character vector. A vector with variable descriptions that correspond to the order of variables input into data. Defaults to NULL or the column names of data

reduce

Boolean. Should redundancy reduction be performed? Defaults to TRUE. Set to FALSE for redundancy analysis only

auto

Boolean. Should redundancy reduction be automated? Defaults to TRUE. Set to FALSE for manual selection

label.latent

Boolean. Should latent variables be labelled? Defaults to TRUE. Set to FALSE for arbitrary labelling (i.e., "LV_")

reduce.method

Character. How should data be reduced? Defaults to "latent"

"latent" Redundant variables will be combined into a latent variable
"remove" All but one redundant variable will be removed
"sum" Redundant variables are combined by summing across cases (rows)

lavaan.args

List. If reduce.method = "latent", then lavaan's cfa function will be used to create latent variables to reduce variables. Arguments should be input as a list. Some example arguments (see lavOptions for full details):

estimator Estimator to use for latent variables (see Estimators) for more details. Defaults to "MLR" for continuous data and "WLSMV" for mixed and categorical data. Data are considered continuous data if they have 6 or more categories (see Rhemtulla, Brosseau-Liard, & Savalei, 2012)
missing How missing data should be handled. Defaults to "fiml"
std.lv If TRUE, the metric of each latent variable is determined by fixing their (residual) variances to 1.0. If FALSE, the metric of each latent variable is determined by fixing the factor loading of the first indicator to 1.0. If there are multiple groups, std.lv = TRUE and "loadings" is included in the group.label argument, then only the latent variances i of the first group will be fixed to 1.0, while the latent variances of other groups are set free. Defaults to TRUE

adhoc

Boolean. Should adhoc check of redundancies be performed? Defaults to TRUE. If TRUE, adhoc check will run the redundancy analysis on the reduced variable set to determine if there are any remaining redundancies. This check is performed with the arguments: method = "wTO", type = "threshold", and sig = .20. This check is based on Christensen, Garrido, and Golino's (2020) simulation where these parameters were found to be the most conservative, demonstrating few false positives and false negatives

plot.redundancy

Boolean. Should redundancies be plotted in a network plot? Defaults to FALSE

plot.args

List. Arguments to be passed onto ggnet2. Defaults:

vsize = 6Changes node size
alpha = 0.4Changes transparency
label.size = 5Changes label size
edge.alpha = 0.7Changes edge transparency

Author

Alexander Christensen <alexpaulchristensen@gmail.com>

References

# Simulation using UVA
Christensen, A. P., Garrido, L. E., & Golino, H. (under review). Unique Variable Analysis: A novel approach for detecting redundant variables in multivariate data. PsyArXiv.

# Implementation of UVA (formally node.redundant)
Christensen, A. P., Golino, H., & Silvia, P. J. (2020). A psychometric network perspective on the validity and validation of personality trait questionnaires. European Journal of Personality, 34, 1095-1108.

# wTO measure
Nowick, K., Gernat, T., Almaas, E., & Stubbs, L. (2009). Differences in human and chimpanzee gene expression patterns define an evolving network of transcription factors in brain. Proceedings of the National Academy of Sciences, 106, 22358-22363.

# Selection of CFA Estimator
Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17, 354-373.

Examples

Run this code

# Select Five Factor Model personality items only
idx <- na.omit(match(gsub("-", "", unlist(psychTools::spi.keys[1:5])), colnames(psychTools::spi)))
items <- psychTools::spi[,idx]

# Change names in redundancy output to each item's description
key.ind <- match(colnames(items), as.character(psychTools::spi.dictionary$item_id))
key <- as.character(psychTools::spi.dictionary$item[key.ind])

if (FALSE) {
# Automated selection of local dependence (default)
uva.results <- UVA(data = items, key = key)

# Produce Methods section
methods.section(uva.results)}

# Manual selection of local dependence
if(interactive()){
uva.results <- UVA(data = items, key = key, auto = FALSE)}

Run the code above in your browser using DataLab