cps.normal: Power simulations for cluster-randomized trials: Parallel Designs, Normal Outcome

Description

This function uses Monte Carlo methods (simulations) to estimate power for parallel design cluster-randomized trials with normal outcomes. Users can modify a variety of parameters to suit the simulations to their desired experimental situation.

Users must specify the desired number of simulations, number of subjects per cluster, number of clusters per arm, expected means of the arms, and two of the following: ICC, within-cluster variance, or between-cluster variance. Defaults are provided for significance level, analytic method, progress updates, and whether the simulated data sets are retained.

Users have the option of specifying different variance parameters for each arm, different numbers of clusters for each treatment group, and different numbers of units within each cluster.

Non-convergent models are not included in the calculation of exact confidence intervals.

Usage

cps.normal(
  nsim = NA,
  nclusters = NA,
  nsubjects = NA,
  mu = 0,
  mu2 = NA,
  ICC = NA,
  sigma_sq = NA,
  sigma_b_sq = NA,
  ICC2 = NA,
  sigma_sq2 = NA,
  sigma_b_sq2 = NA,
  alpha = 0.05,
  method = "glmm",
  quiet = FALSE,
  allSimData = FALSE,
  seed = NA,
  poorFitOverride = FALSE,
  timelimitOverride = TRUE,
  lowPowerOverride = FALSE,
  irgtt = FALSE,
  nofit = FALSE
)

Arguments

nsim

Number of datasets to simulate; accepts integer. Required.

nclusters

Number of clusters per condition; accepts single integer (implying equal numbers of clusters in the two groups) or vector of length 2 (unequal number of clusters per arm). Required.

nsubjects

Number of subjects per cluster; accepts either a scalar (implying equal cluster sizes for the two groups), a vector of length two (equal cluster sizes within arm), or a vector of length sum(nclusters) (unequal cluster sizes within arm). Required.

Mean in the first arm; accepts numeric, default 0. Required..

mu2

Mean in the second arm; accepts numeric. Required.

At least 2 of the following must be specified:

ICC

Intra-cluster correlation coefficient; accepts a value between 0 and 1.

sigma_sq

Within-cluster variance; accepts numeric.

sigma_b_sq

Between-cluster variance; accepts numeric.

The defaults for the following are all NA, implying equal variance parameters for the two groups. If one of the following is given, variance parameters differ between treatment groups, and at least 2 of the following must be specified:

ICC2

Intra-cluster correlation coefficient for clusters in the second arm.

sigma_sq2

Within-cluster variance for clusters in the second arm.

sigma_b_sq2

Between-cluster variance for clusters in the second arm.

Optional parameters:

alpha

Significance level; default = 0.05.

method

Analytical method, either Generalized Linear Mixed Effects Model (GLMM, default) or Generalized Estimating Equation (GEE). Accepts c('glmm', 'gee').

quiet

When set to FALSE, displays simulation progress and estimated completion time; default is FALSE.

allSimData

Option to include a list of all simulated datasets in the output object. Default = FALSE.

seed

Option to set the seed. Default, NA, selects a seed based on the system clock.

poorFitOverride

Option to override stop() if more than 25% of fits fail to converge.

timelimitOverride

Logical. When FALSE, stops execution if the estimated completion time is more than 2 minutes. Defaults to TRUE.

lowPowerOverride

Option to override stop() if the power is less than 0.5 after the first 50 simulations and every ten simulations thereafter. On function execution stop, the actual power is printed in the stop message. Default = FALSE. When TRUE, this check is ignored and the calculated power is returned regardless of value.

irgtt

Logical. Default = FALSE. Is the experimental design an individually randomized group treatment trial? For details, see ?cps.irgtt.normal.

nofit

Option to skip model fitting and analysis and instead return a dataframe with the simulated datasets. Default = FALSE.

Value

If nofit = F, a list with the following components:

Character string indicating total number of simulations and simulation type
Number of simulations
Data frame with columns "Power" (Estimated statistical power), "lower.95.ci" (Lower 95% confidence interval bound), "upper.95.ci" (Upper 95% confidence interval bound), "Alpha" (Probability of committing a type I or error or rejecting a true null), "Beta" (Probability of committing a type II error or failing to reject a false null). Note that non-convergent models are returned for review, but not included in this calculation.
Analytic method used for power estimation
Significance level
Vector containing user-defined cluster sizes
Vector containing user-defined number of clusters in each arm
Data frame reporting ICC, variance parameters, and means for each arm
Vector containing expected group means based on user inputs
Data frame with columns: "Estimate" (Estimate of treatment effect for a given simulation), "Std.err" (Standard error for treatment effect estimate), "Test.statistic" (z-value (for GLMM) or Wald statistic (for GEE)), "p.value", "converge", (Did the model converge?)
If allSimData = TRUE, a list of data frames, each containing: "y" (Simulated response value), "trt" (Indicator for arm), "clust" (Indicator for cluster)

If nofit = T, a data frame of the simulated data sets, containing:

"arm" (Indicator for treatment arm)
"clust" (Indicator for cluster)
"y1" ... "yn" (Simulated response value for each of the nsim data sets).

Testing details

This function has been verified, where possible, against reference values from the NIH's GRT Sample Size Calculator, PASS11, CRTsize::n4means, and clusterPower::cpa.normal.

Details

The data generating model for observation i in cluster j is: y_ij N( + b_i, ^2) for the first group or arm, where b_i N(0,_b^2) , while for the second group,

y_ij N(_2 + b_i, _2^2) where b_i N(0,_b_2^2); if none of _2^2, _b_2^2 or ICC2 are used, then the second group uses b_i N(0,_b^2) and y_ij N(_2 + b_i, ^2) .

All random terms are generated independent of one another.

For calls without _2^2, _b_2^2 or ICC2, and using method="glmm" the fitted model is: y_ij|b_i = + _1 x_ij + b_i + e_ij

with _1 = _2 - , treatment group indicator x_ij = 0 for the first group, with b_i N(0, _b^2) and e_ij N(0,^2). In this case, both the random effects distribution and the residual distribution are the same for both conditions.

Otherwise, for method="glmm" the fitted model is: y_ij|b_i = + _1 x_ij + b_i I(x_ij=0) + e_ij I(x_ij=0) + g_i I(x_ij=1) + f_ij I(x_ij=1)

with _1, x_ij, b_i, and e_ij as above, with g_i N(0, _b_2^2) and f N(0,_2^2), the random effects and residual distribution in the second group.

Examples

Run this code

# NOT RUN {
# Estimate power for a trial with 10 clusters in each arm and 25 subjects in each 
# cluster, with an ICC of .3, sigma squared of 20 (implying sigma_b^2 of 8.57143) 
# in each group, with arm means of 1 and 4.75 in the two groups, using 100 simulated 
# data sets. The resulting estimated power should be 0.78.
   
# }
# NOT RUN {
normal.sim = cps.normal(nsim = 100, nsubjects = 25, nclusters = 10, mu = 1, 
  mu2 = 4.75, ICC = 0.3, sigma_sq = 20, seed = 123)
# }
# NOT RUN {


# Estimate power for a trial with 5 clusters in one arm, those clusters having 25 subjects 
# each, 25 clusters in the other arm, those clusters having 5 subjects each, the first arm
# having a sigma squared of 20 and sigma_b squared of 8.57143, and the second a sigma squared
# of 9 and a sigma_b squared of 1, with estimated arm means of 1 and 4.75 in the first and 
# second groups, respectively, using 100 simulated data sets analyzed by the GEE method. 
# The estimated power should be 0.79, assuming seed = 123.

# }
# NOT RUN {
normal.sim2 = cps.normal(nsim = 100, nclusters = c(5,25), nsubjects = c(25,5), mu = 1, 
  mu2 = 4.75, sigma_sq = 20,sigma_b_sq = 8.8571429, sigma_sq2 = 9, sigma_b_sq2 = 1, 
  method = "gee", seed = 123)
# }
# NOT RUN {

# Estimate power for a trial with 5 clusters in one arm, those clusters having
# 4, 5, 6, 7, 7, and 7 subjects each, and 10 clusters in the other arm,
# those clusters having 5 subjects each, with sigma_b_sq = .3 and and ICC of .3 in both arms.
# We have estimated arm means of 1 and 2 in the first and second arms, respectively, and we use
# 100 simulated data sets analyzed by the GLMM method.

# }
# NOT RUN {
normal.sim2 = cps.normal(nsim = 100, nclusters = c(6,10), 
  nsubjects = list(c(4, 5, 6, 7, 7, 7), rep(5, times = 10)),
  mu = 1, mu2 = 2, sigma_b_sq = .3, ICC = .3, method = "glmm",
  seed = 1)
# }
# NOT RUN {
# The resulting estimated power (if you set seed = 1) should be about 0.76.

# Estimate power for a trial with 3 clusters in one arm, 
# those clusters having 25, 35, and 45 subjects each, and 10 clusters 
# in the other arm, those clusters having 5 subjects each, the first arm
# having a sigma squared of 20 and sigma_b squared of 8.57143, and the second a sigma squared
# of 9 and a sigma_b squared of 1, with estimated arm means of 1 and 4.75 in the first and 
# second groups, respectively, using 100 simulated data sets analyzed by the GLMM method.

# }
# NOT RUN {
normal.sim2 <- cps.normal(nsim = 100, nclusters = c(3,10), 
  nsubjects = c(25, 35, 45, rep(5, times = 10)),
  mu = 1, mu2 = 4.75, sigma_sq = 20, sigma_b_sq = 8.8571429,
  sigma_sq2 = 9, sigma_b_sq2 = 1, method = "glmm")
# }
# NOT RUN {
# The resulting estimated power (if you set seed = 1) should be about 0.71.


# }

Run the code above in your browser using DataLab