Identifies redundant variables in a multivariate dataset using a number of different association methods and types of significance values (see Christensen, Garrido, & Golino, 2020 for more details)
UVA(
data,
n = NULL,
model = c("glasso", "TMFG"),
corr = c("cor_auto", "pearson", "spearman"),
method = c("cor", "pcor", "wTO"),
type = c("adapt", "alpha", "threshold"),
sig,
key = NULL,
reduce = TRUE,
auto = TRUE,
label.latent = FALSE,
reduce.method = c("latent", "remove", "sum"),
lavaan.args = list(),
adhoc = TRUE,
plot.redundancy = FALSE,
plot.args = list()
)
Returns a list:
A list containing several objects:
redudant
Vectors nested within the list corresponding
to redundant nodes with the name of object in the list
data
Original data
correlation
Correlation matrix of original data
weights
Weights determine by weighted topological overlap,
partial correlation, or zero-order correlation
network
If method = "wTO"
, then
the network computed following EGA
with
EBICglasso
network estimation
plot
If redundancy.plot = TRUE
, then
a plot of all redundancies found
descriptives
basic A vector containing the mean, standard deviation, median, median absolute deviation (MAD), 3 times the MAD, 6 times the MAD, minimum, maximum, and critical value for the overlap measure (i.e., weighted topological overlap, partial correlation, or threshold)
centralTendency A matrix for all (absolute) non-zero values and their respective standard deviation from the mean and median absolute deviation from the median
method
Returns method
argument
type
Returns type
argument
distribution
If type != "threshold"
, then
distribution that was used to determine significance
If reduce = TRUE
, then a list containing:
data
New data with redundant variables merged or removed
merged
A matrix containing the variables that were
decided to be redundant with one another
method
Method used to perform redundancy reduction
If adhoc = TRUE
, then
the adhoc check containing the same objects as in
the redundancy
list object in the output
Matrix or data frame. Input can either be data or a correlation matrix
Numeric.
If input in data
is a correlation matrix,
then sample size is required.
Defaults to NULL
Character. A string indicating the method to use. Current options are:
glasso
Estimates the Gaussian graphical model using graphical LASSO with
extended Bayesian information criterion to select optimal regularization parameter.
This is the default method
TMFG
Estimates a Triangulated Maximally Filtered Graph
Type of correlation matrix to compute. The default uses cor_auto
.
Current options are:
cor_auto
Computes the correlation matrix using the cor_auto
function from
qgraph
.
pearson
Computes Pearson's correlation coefficient using the pairwise complete observations via
the cor
function.
spearman
Computes Spearman's correlation coefficient using the pairwise complete observations via
the cor
function.
Character.
Computes weighted topological overlap ("wTO"
using EBICglasso
),
partial correlations ("pcor"
), or correlations ("cor"
)
Defaults to "wTO"
Character. Type of significance.
Computes significance using the standard p-value ("alpha"
),
adaptive alpha p-value (adapt.a
),
or some threshold "threshold"
.
Defaults to "threshold"
Numeric.
p-value for significance of overlap (defaults to .05
).
Defaults for "threshold"
for each method
:
"wTO"
.25
"pcor"
.35
"cor"
.50
Character vector.
A vector with variable descriptions that correspond
to the order of variables input into data
.
Defaults to NULL
or the column names of data
Boolean.
Should redundancy reduction be performed?
Defaults to TRUE
.
Set to FALSE
for redundancy analysis only
Boolean.
Should redundancy reduction be automated?
Defaults to TRUE
.
Set to FALSE
for manual selection
Boolean.
Should latent variables be labelled?
Defaults to TRUE
.
Set to FALSE
for arbitrary labelling (i.e., "LV_")
Character.
How should data be reduced?
Defaults to "latent"
"latent"
Redundant variables will be combined into a latent variable
"remove"
All but one redundant variable will be removed
"sum"
Redundant variables are combined by summing across cases (rows)
List.
If reduce.method = "latent"
, then lavaan
's cfa
function will be used to create latent variables to reduce variables.
Arguments should be input as a list. Some example arguments
(see lavOptions for full details
):
estimator
Estimator to use for latent variables (see Estimators)
for more details. Defaults to "MLR"
for continuous data and "WLSMV"
for mixed and categorical data.
Data are considered continuous data if they have 6 or more categories (see Rhemtulla, Brosseau-Liard, & Savalei, 2012)
missing
How missing data should be handled. Defaults to "fiml"
std.lv
If TRUE
, the metric of each latent variable is determined by fixing their (residual) variances to 1.0.
If FALSE
, the metric of each latent variable is determined by fixing the factor loading of the first
indicator to 1.0. If there are multiple groups, std.lv = TRUE
and "loadings"
is included in the
group.label
argument, then only the latent variances i of the first group will be fixed to 1.0, while
the latent variances of other groups are set free.
Defaults to TRUE
Boolean.
Should adhoc check of redundancies be performed?
Defaults to TRUE
.
If TRUE
, adhoc check will run the redundancy analysis
on the reduced variable set to determine if there are any remaining
redundancies. This check is performed with the arguments:
method = "wTO"
, type = "threshold"
, and sig = .20
.
This check is based on Christensen, Garrido, and Golino's (2020)
simulation where these parameters were found to be the most conservative,
demonstrating few false positives and false negatives
Boolean.
Should redundancies be plotted in a network plot?
Defaults to FALSE
List.
Arguments to be passed onto ggnet2
.
Defaults:
vsize = 6
Changes node size
alpha = 0.4
Changes transparency
label.size = 5
Changes label size
edge.alpha = 0.7
Changes edge transparency
Alexander Christensen <alexpaulchristensen@gmail.com>
# Simulation using UVA
Christensen, A. P., Garrido, L. E., & Golino, H. (under review).
Unique Variable Analysis: A novel approach for detecting redundant variables in multivariate data.
PsyArXiv.
# Implementation of UVA
(formally node.redundant
)
Christensen, A. P., Golino, H., & Silvia, P. J. (2020).
A psychometric network perspective on the validity and validation of personality trait questionnaires.
European Journal of Personality, 34, 1095-1108.
# wTO measure
Nowick, K., Gernat, T., Almaas, E., & Stubbs, L. (2009).
Differences in human and chimpanzee gene expression patterns define an evolving network of transcription factors in brain.
Proceedings of the National Academy of Sciences, 106, 22358-22363.
# Selection of CFA Estimator
Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012).
When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions.
Psychological Methods, 17, 354-373.
# Select Five Factor Model personality items only
idx <- na.omit(match(gsub("-", "", unlist(psychTools::spi.keys[1:5])), colnames(psychTools::spi)))
items <- psychTools::spi[,idx]
# Change names in redundancy output to each item's description
key.ind <- match(colnames(items), as.character(psychTools::spi.dictionary$item_id))
key <- as.character(psychTools::spi.dictionary$item[key.ind])
if (FALSE) {
# Automated selection of local dependence (default)
uva.results <- UVA(data = items, key = key)
# Produce Methods section
methods.section(uva.results)}
# Manual selection of local dependence
if(interactive()){
uva.results <- UVA(data = items, key = key, auto = FALSE)}
Run the code above in your browser using DataLab