simulate_d_database: Simulate d value databases of primary studies

Description

The simulate_d_database function generates databases of psychometric d value data from sample-size parameters, correlation parameters, mean parameters, standard deviation parameters, reliability parameters, and selection-ratio parameters. The output database can be provided in a long format. If composite variables are to be formed, parameters can also be defined for the weights used to form the composites as well as the selection ratios applied to the composites. This function will return a database of statistics as well as a database of parameters - the parameter database contains the actual study parameters for each simulated sample (without sampleing error) to allow comparisons between meta-analytic results computed from the statistics and the actual means and variances of parameters. The merge_simdat_d function can be used to merge multiple simulated databases and the sparsify_simdat_d function can be used to randomly delete artifact information (a procedure commonly done in simulations of artifact-distribution methods).

Usage

simulate_d_database(
  k,
  n_params,
  rho_params,
  mu_params = NULL,
  sigma_params = 1,
  rel_params = 1,
  sr_params = 1,
  k_items_params = 1,
  wt_params = NULL,
  allow_neg_wt = FALSE,
  sr_composite_params = NULL,
  group_names = NULL,
  var_names = NULL,
  composite_names = NULL,
  diffs_as_obs = FALSE,
  show_applicant = FALSE,
  keep_vars = NULL,
  decimals = 2,
  max_iter = 100,
  ...
)

Value

A database of simulated primary studies' statistics and analytically determined parameter values.

Arguments

k: Number of studies to simulate.
n_params: List of parameter distributions (or data-generation function; see details) for subgroup sample sizes.
rho_params: List containing a list of parameter distributions (or data-generation functions; see details) for correlations for each simulated group. If simulating data from a single fixed population matrix in each group, supply a list of those matrices for this argument (if the diagonals contains non-unity values and 'sigma_params' argument is not specified, those values will be used as variances).
mu_params: List containing a list of parameter distributions (or data-generation functions; see details) for means for each simulated group. If NULL, all means will be set to zero.
sigma_params: List containing a list of parameter distributions (or data-generation functions; see details) for standard deviations for each simulated group. If NULL, all standard deviations will be set to unity.
rel_params: List containing a list of parameter distributions (or data-generation functions; see details) for reliabilities for each simulated group. If NULL, all reliabilities will be set to unity.
sr_params: List of parameter distributions (or data-generation functions; see details) for selection ratios. If NULL, all selection ratios will be set to unity.
k_items_params: List of parameter distributions (or data-generation functions; see details) for the number of test items comprising each of the variables to be simulated (all are single-item variables by default).
wt_params: List of parameter distributions (or data-generation functions; see details) to create weights for use in forming composites. If multiple composites are formed, the list should be a list of lists, with the general format: list(comp1_params = list(...params...), comp2_params = list(...params...), etc.).
allow_neg_wt: Logical scalar that determines whether negative weights should be allowed (TRUE) or not (FALSE).
sr_composite_params: Parameter distributions (or data-generation functions; see details) for composite selection ratios.
group_names: Optional vector of group names.
var_names: Optional vector of variable names for all non-composite variables.
composite_names: Optional vector of names for composite variables.
diffs_as_obs: Logical scalar that determines whether standard deviation parameters represent standard deviations of observed scores (TRUE) or of true scores (FALSE; default).
show_applicant: Should applicant data be shown for sample statistics (TRUE) or suppressed (FALSE)?
keep_vars: Optional vector of variable names to be extracted from the simulation and returned in the output object. All variables are returned by default. Use this argument when only some variables are of interest and others are generated solely to serve as selection variables.
decimals: Number of decimals to which statistical results (not parameters) should be rounded. Rounding to 2 decimal places best captures the precision of data available from published primary research.
max_iter: Maximum number of iterations to allow in the parameter selection process before terminating with convergence failure. Must be finite.
...: Additional arguments.

Details

Values supplied as any argument with the suffix "params" can take any of three forms (see Examples for a demonstration of usage):

A vector of values from which study parameters should be sampled.
A vector containing a mean with a variance or standard deviation. These values must be named "mean," "var," and "sd", respectively, for the program to recognize which value is which.
A matrix containing a row of values (this row must be named "values") from which study parameters should be sampled and a row of weights (this row must be labeled 'weights') associated with the values to be sampled.
A matrix containing a column of values (this column must be named "values") from which study parameters should be sampled and a column of weights (this column must be labeled 'weights') associated with the values to be sampled.
A function that is configured to generate data using only one argument that defines the number of cases to generate, e.g., fun(n = 10).

Examples

Run this code

if (requireNamespace("nor1mix", quietly = TRUE)) {
  ## Define sample sizes, means, and other parameters for each of two groups:
  n_params <- list(c(mean = 200, sd = 20),
                   c(mean = 100, sd = 20))
  rho_params <- list(list(c(.3, .4, .5)),
                     list(c(.3, .4, .5)))
  mu_params <- list(list(c(mean = .5, sd = .5), c(-.5, 0, .5)),
                    list(c(mean = 0, sd = .5), c(-.2, 0, .2)))
  sigma_params <- list(list(1, 1),
                       list(1, 1))
  rel_params <- list(list(.8, .8),
                     list(.8, .8))
  sr_params <- list(1, .5)

  simulate_d_database(k = 5, n_params = n_params, rho_params = rho_params,
                      mu_params = mu_params, sigma_params = sigma_params,
                      rel_params = rel_params, sr_params = sr_params,
                      k_items = c(4, 4),
                      group_names = NULL, var_names = c("y1", "y2"),
                      show_applicant = TRUE, keep_vars = c("y1", "y2"), decimals = 2)
}

Run the code above in your browser using DataLab