simulate_r_database: Simulate correlation databases of primary studies

Description

The simulate_r_database function generates databases of psychometric correlation data from sample-size parameters, correlation parameters, reliability parameters, and selection-ratio parameters. The output database can be provided in either a long format or a wide format. If composite variables are to be formed, parameters can also be defined for the weights used to form the composites as well as the selection ratios applied to the composites. This function will return a database of statistics as well as a database of parameters - the parameter database contains the actual study parameters for each simulated sample (without sampleing error) to allow comparisons between meta-analytic results computed from the statistics and the actual means and variances of parameters. The merge_simdat_r function can be used to merge multiple simulated databases and the sparsify_simdat_r function can be used to randomly delete artifact information (a procedure commonly done in simulations of artifact-distribution methods).

Usage

simulate_r_database(
  k,
  n_params,
  rho_params,
  mu_params = 0,
  sigma_params = 1,
  rel_params = 1,
  sr_params = 1,
  k_items_params = 1,
  wt_params = NULL,
  allow_neg_wt = FALSE,
  sr_composite_params = NULL,
  var_names = NULL,
  composite_names = NULL,
  n_as_ni = FALSE,
  show_applicant = FALSE,
  keep_vars = NULL,
  decimals = 2,
  format = "long",
  max_iter = 100,
  ...
)

Value

A database of simulated primary studies' statistics and analytically determined parameter values.

Arguments

k: Number of studies to simulate.
n_params: Parameter distribution (or data-generation function; see details) for sample size.
rho_params: List of parameter distributions (or data-generation functions; see details) for correlations. If simulating data from a single fixed population matrix, that matrix can be supplied for this argument (if the diagonal contains non-unity values and 'sigma_params' is not specified, those values will be used as variances).
mu_params: List of parameter distributions (or data-generation functions; see details) for means.
sigma_params: List of parameter distributions (or data-generation functions; see details) for standard deviations.
rel_params: List of parameter distributions (or data-generation functions; see details) for reliabilities.
sr_params: List of parameter distributions (or data-generation functions; see details) for selection ratios.
k_items_params: List of parameter distributions (or data-generation functions; see details) for the number of test items comprising each of the variables to be simulated (all are single-item variables by default).
wt_params: List of parameter distributions (or data-generation functions; see details) to create weights for use in forming composites. If multiple composites are formed, the list should be a list of lists, with the general format: list(comp1_params = list(...params...), comp2_params = list(...params...), etc.).
allow_neg_wt: Logical scalar that determines whether negative weights should be allowed (TRUE) or not (FALSE).
sr_composite_params: Parameter distributions (or data-generation functions; see details) for composite selection ratios.
var_names: Optional vector of variable names for all non-composite variables.
composite_names: Optional vector of names for composite variables.
n_as_ni: Logical argument determining whether n specifies the incumbent sample size (TRUE) or the applicant sample size (FALSE; default). This can only be TRUE when only one variable is involved in selection.
show_applicant: Should applicant data be shown for sample statistics (TRUE) or suppressed (FALSE)?
keep_vars: Optional vector of variable names to be extracted from the simulation and returned in the output object. All variables are returned by default. Use this argument when only some variables are of interest and others are generated solely to serve as selection variables.
decimals: Number of decimals to which statistical results (not parameters) should be rounded. Rounding to 2 decimal places best captures the precision of data available from published primary research.
format: Database format: "long" or "wide."
max_iter: Maximum number of iterations to allow in the parameter selection process before terminating with convergence failure. Must be finite.
...: Additional arguments.

Details

Values supplied as any argument with the suffix "params" can take any of three forms (see Examples for a demonstration of usage):

A vector of values from which study parameters should be sampled.
A vector containing a mean with a variance or standard deviation. These values must be named "mean," "var," and "sd", respectively, for the program to recognize which value is which.
A matrix containing a row of values (this row must be named "values") from which study parameters should be sampled and a row of weights (this row must be labeled 'weights') associated with the values to be sampled.
A matrix containing a column of values (this column must be named "values") from which study parameters should be sampled and a column of weights (this column must be labeled 'weights') associated with the values to be sampled.
A function that is configured to generate data using only one argument that defines the number of cases to generate, e.g., fun(n = 10).

Examples

Run this code

if (FALSE) {
## Note the varying methods for defining parameters:
n_params = function(n) rgamma(n, shape = 100)
rho_params <- list(c(.1, .3, .5),
                   c(mean = .3, sd = .05),
                   rbind(value = c(.1, .3, .5), weight = c(1, 2, 1)))
rel_params = list(c(.7, .8, .9),
                  c(mean = .8, sd = .05),
                  rbind(value = c(.7, .8, .9), weight = c(1, 2, 1)))
sr_params = c(list(1, 1, c(.5, .7)))
sr_composite_params = list(1, c(.5, .6, .7))
wt_params = list(list(c(1, 2, 3),
                      c(mean = 2, sd = .25),
                      rbind(value = c(1, 2, 3), weight = c(1, 2, 1))),
                 list(c(1, 2, 3),
                      c(mean = 2, sd = .25),
                      cbind(value = c(1, 2, 3), weight = c(1, 2, 1))))

## Simulate with long format
simulate_r_database(k = 10, n_params = n_params, rho_params = rho_params,
                  rel_params = rel_params, sr_params = sr_params,
                  sr_composite_params = sr_composite_params, wt_params = wt_params,
                  var_names = c("X", "Y", "Z"), format = "long")

## Simulate with wide format
simulate_r_database(k = 10, n_params = n_params, rho_params = rho_params,
                  rel_params = rel_params, sr_params = sr_params,
                  sr_composite_params = sr_composite_params, wt_params = wt_params,
                  var_names = c("X", "Y", "Z"), format = "wide")
}

Run the code above in your browser using DataLab