Make Variable Clustering Quarto Report Section
vClus(
d,
exclude = NULL,
corrmatrix = FALSE,
redundancy = FALSE,
spc = FALSE,
trans = FALSE,
rexclude = NULL,
fracmiss = 0.2,
maxlevels = 10,
minprev = 0.05,
imputed = NULL,
horiz = FALSE,
label = "fig-varclus",
print = TRUE,
redunargs = NULL,
spcargs = NULL,
transaceargs = NULL,
transacefile = NULL,
spcfile = NULL
)
makes Quarto tabs and prints output, returning nothing unless spc=TRUE
or trans=TRUE
are used, in which case a list with components princmp
and/or transace
is returned and these components can be passed to special print
and plot
methods for spc
or to ggplot_transace
. The user can put scree plots and PC loading plots in separate code chunks that use different figure sizes that way.
a data frame or table
formula or vector of character strings containing variables to exclude from analysis
set to TRUE
to use Hmisc::plotCorrM()
to depict a Spearman rank correlation matrix.
set to TRUE
to run Hmisc::redun()
on non-excluded variables
set to TRUE
to run Hmisc::princmp()
to do a sparse principal component analysis with the argument method='sparse'
passed
set to TRUE
to run Hmisc::transace()
to transform each predictor before running redundancy or principal components analysis. transace
is run on the stacked filled-in data if imputed
is given.
extra variables to exclude from transace
transformating-finding, redundancy analysis, and sparce principal components (formula or character vector)
if the fraction of NA
s for a variable exceeds this the variable will not be included
if the maximum number of distinct values for a categorical variable exceeds this, the variable will be dropped
the minimum proportion of non-missing observations in a category for a binary variable to be retained, and the minimum relative frequency of a category before it will be combined with other small categories
an object created by Hmisc::aregImpute()
or mice::mice()
that contains information from multiple imputation that causes vClus
to create all the filled-in datasets, stack them into one tall dataset, and pass that dataset to Hmisc::redun()
or Hmisc::princmp()
so that NA
s can be handled efficiently in redundancy analysis and sparse principal components, i.e., without excluding partial records. Variable clustering and the correlation matrix are already efficient because they use pairwise deletion of NA
s.
set to TRUE
to draw the dendrogram horizontally
figure label for Quarto
set to FALSE
to not let dataframeReduce
report details
a list()
of other arguments passed to Hmisc::redun()
a list()
of other arguments passed to Hmisc::princmp()
a list()
of other arguments passed to Hmisc::transace()
similar to spcfile
and can be used when trans=TRUE
a character string specifying an .rds
R binary file to hold the results of sparse principal component analysis. Using Hmisc::runifChanged()
, if the file name is specified and no inputs have changed since the last run, the result is read from the file. Otherwise a new run is made and the file is recreated if spcfile
is specified. This is done because sparse principal components can take several minutes to run on large files.
Frank Harrell
Draws a variable clustering dendrogram and optionally graphically depicts a correlation matrix. See this for an example. Uses Hmisc::varclus()
.
Hmisc::varclus()
, Hmisc::plotCorrM()
, Hmisc::dataframeReduce()
, Hmisc::redun()
, Hmisc::princmp()
, Hmisc::transace()
if (FALSE) {
vClus(mydata, exclude=.q(country, city))
}
Run the code above in your browser using DataLab