vClus: cClus

Description

Make Variable Clustering Quarto Report Section

Usage

vClus(
  d,
  exclude = NULL,
  corrmatrix = FALSE,
  redundancy = FALSE,
  spc = FALSE,
  trans = FALSE,
  rexclude = NULL,
  fracmiss = 0.2,
  maxlevels = 10,
  minprev = 0.05,
  imputed = NULL,
  horiz = FALSE,
  label = "fig-varclus",
  print = TRUE,
  redunargs = NULL,
  spcargs = NULL,
  transaceargs = NULL,
  transacefile = NULL,
  spcfile = NULL
)

Value

makes Quarto tabs and prints output, returning nothing unless spc=TRUE or trans=TRUE are used, in which case a list with components princmp and/or transace is returned and these components can be passed to special print and plot methods for spc or to ggplot_transace. The user can put scree plots and PC loading plots in separate code chunks that use different figure sizes that way.

Arguments

d: a data frame or table
exclude: formula or vector of character strings containing variables to exclude from analysis
corrmatrix: set to TRUE to use Hmisc::plotCorrM() to depict a Spearman rank correlation matrix.
redundancy: set to TRUE to run Hmisc::redun() on non-excluded variables
spc: set to TRUE to run Hmisc::princmp() to do a sparse principal component analysis with the argument method='sparse' passed
trans: set to TRUE to run Hmisc::transace() to transform each predictor before running redundancy or principal components analysis. transace is run on the stacked filled-in data if imputed is given.
rexclude: extra variables to exclude from transace transformating-finding, redundancy analysis, and sparce principal components (formula or character vector)
fracmiss: if the fraction of NAs for a variable exceeds this the variable will not be included
maxlevels: if the maximum number of distinct values for a categorical variable exceeds this, the variable will be dropped
minprev: the minimum proportion of non-missing observations in a category for a binary variable to be retained, and the minimum relative frequency of a category before it will be combined with other small categories
imputed: an object created by Hmisc::aregImpute() or mice::mice() that contains information from multiple imputation that causes vClus to create all the filled-in datasets, stack them into one tall dataset, and pass that dataset to Hmisc::redun() or Hmisc::princmp() so that NAs can be handled efficiently in redundancy analysis and sparse principal components, i.e., without excluding partial records. Variable clustering and the correlation matrix are already efficient because they use pairwise deletion of NAs.
horiz: set to TRUE to draw the dendrogram horizontally
label: figure label for Quarto
print: set to FALSE to not let dataframeReduce report details
redunargs: a list() of other arguments passed to Hmisc::redun()
spcargs: a list() of other arguments passed to Hmisc::princmp()
transaceargs: a list() of other arguments passed to Hmisc::transace()
transacefile: similar to spcfile and can be used when trans=TRUE
spcfile: a character string specifying an .rds R binary file to hold the results of sparse principal component analysis. Using Hmisc::runifChanged(), if the file name is specified and no inputs have changed since the last run, the result is read from the file. Otherwise a new run is made and the file is recreated if spcfile is specified. This is done because sparse principal components can take several minutes to run on large files.

Author

Frank Harrell

Details

Draws a variable clustering dendrogram and optionally graphically depicts a correlation matrix. See this for an example. Uses Hmisc::varclus().

Examples

Run this code

if (FALSE) {
vClus(mydata, exclude=.q(country, city))
}

Run the code above in your browser using DataLab