cps.binary: Power simulations for cluster-randomized trials: Parallel Designs, Binary Outcome

Description

This function uses Monte Carlo methods (simulations) to estimate power for cluster-randomized trials. Users can modify a variety of parameters to suit the simulations to their desired experimental situation.

Users must specify the desired number of simulations, number of subjects per cluster, number of clusters per arm, and two of the following three parameters: expected probability of the outcome in one group, expected probability of the outcome in the second group, and expected difference in probabilities between groups. Default values are provided for significance level, analytic method, progress updates, and whether the simulated data sets are retained.

Usage

cps.binary(
  nsim = NULL,
  nsubjects = NULL,
  nclusters = NULL,
  p1 = NULL,
  p2 = NULL,
  sigma_b_sq = NULL,
  sigma_b_sq2 = NULL,
  alpha = 0.05,
  method = "glmm",
  quiet = FALSE,
  allSimData = FALSE,
  seed = NA,
  nofit = FALSE,
  poorFitOverride = FALSE,
  lowPowerOverride = FALSE,
  timelimitOverride = TRUE,
  irgtt = FALSE
)

Arguments

nsim

Number of datasets to simulate; accepts integer. Required.

nsubjects

Number of subjects per cluster; accepts either a scalar (implying equal cluster sizes for the two groups), a vector of length two (equal cluster sizes within arm), or a vector of length sum(nclusters) (unequal cluster sizes within arm). Required.

nclusters

Number of clusters per treatment group; accepts a single integer (if there are the same number of clusters in each arm) or a vector of 2 integers (if nsubjects differs between arms). If a vector of cluster sizes >2 is provided in nsubjects, sum(nclusters) must match the nsubjects vector length. Required.

Expected probability of outcome in first group.

Expected probability of outcome in second group.

sigma_b_sq

Between-cluster variance; if sigma_b_sq2 is not specified, between-cluster variances are assumed to be equal in the two arms. Accepts numeric. Required.

sigma_b_sq2

Between-cluster variance for clusters in second group. Only required if between-cluster variances differ between treatment arms.

alpha

Significance level; default = 0.05.

method

Data analysis method, either generalized linear mixed effects model (GLMM) or generalized estimating equations (GEE). Accepts c('glmm', 'gee'); default = 'glmm'. Required.

quiet

When set to FALSE, displays simulation progress and estimated completion time, default = TRUE.

allSimData

Option to output list of all simulated datasets; default = FALSE.

seed

Option to set the seed. Default is NA.

nofit

Option to skip model fitting and analysis and only return the simulated data. Default = FALSE.

poorFitOverride

Option to override stop() if more than 25% of fits fail to converge.

lowPowerOverride

Option to override stop() if the power is less than 0.5 after the first 50 simulations and every ten simulations thereafter. On function execution stop, the actual power is printed in the stop message. Default = FALSE. When TRUE, this check is ignored and the calculated power is returned regardless of value.

timelimitOverride

Logical. When FALSE, stops execution if the estimated completion time is more than 2 minutes. Defaults to TRUE.

irgtt

Logical. Default = FALSE. Is the experimental design an individually randomized group treatment trial? For details, see ?cps.irgtt.binary.

Value

If nofit = F, a list with the following components:

Character string indicating total number of simulations, simulation type, and number of convergent models
Number of simulations
Data frame with columns "Power" (estimated statistical power), "lower.95.ci" (lower 95 "upper.95.ci" (upper 95 "Alpha" (probability of committing a Type I error or rejecting a true null), "Beta" (probability of committing a Type II error or failing to reject a false null). Note that non-convergent models are returned for review, but not included in this calculation.
Analytic method used for power estimation
Significance level
Vector containing user-defined cluster sizes
Vector containing user-defined number of clusters
Data frame reporting sigma_b_sq for each group
Vector containing user-supplied outcome probability and estimated odds ratio
Data frame containing three estimates of ICC
Data frame with columns: "Estimate" (Estimate of treatment effect for a given simulation), "Std.err" (Standard error for treatment effect estimate), "Test.statistic" (z-value (for GLMM) or Wald statistic (for GEE)), "p.value", "converge" (Did simulated model converge?)
If allSimData = TRUE, list of data frames, each containing: "y" (Simulated response value), "trt" (Indicator for treatment group), "clust" (Indicator for cluster)
List of warning messages produced by non-convergent models; Includes model number for cross-referencing against model.estimates
Logical vector reporting whether models converged.

If nofit = T, a data frame of the simulated data sets, containing:

"arm" (Indicator for treatment arm)
"cluster" (Indicator for cluster)
"y1" ... "yn" (Simulated response value for each of the nsim data sets).

Testing details

This function has been verified against reference values from the NIH's GRT Sample Size Calculator, PASS11, CRTsize::n4prop, and clusterPower::cpa.binary.

Details

The data generating model for observation j in cluster i is:

y_ij Bernoulli(e^p_1 + b_i1 + e^p_1 + b_i ) for the first group or arm, where b_i N(0,_b^2), while for the second group, y_ij Bernoulli(e^p_2 + b_i1 + e^p_2 + b_i ) where b_i N(0,_b_2^2); if _b_2^2 is not used, then the second group uses b_i N(0,_b^2).

All random terms are generated independent of one another.

Non-convergent models are not included in the calculation of exact confidence intervals.

References

Elridge, S., Ukoumunne, O. & Carlin, J. The Intra-Cluster Correlation Coefficient in Cluster Randomized Trials: A Review of Definitions. International Statistical Review (2009), 77, 3, 378-394. doi: 10.1111/j.1751-5823.2009.00092.x

Snjiders, T. & Bosker, R. Multilevel Analysis: an Introduction to Basic and Advanced Multilevel Modelling. London, 1999: Sage.

Wu S, Crespi CM, Wong WK. Comparison of Methods for Estimating Intraclass Correlation Coefficient for Binary Responses in Cancer Prevention Cluster Randomized Trials. Contemp Clin Trials. 2012; 33(5): 869-880. doi:10.1016/j.cct.2012.05.004 London: Arnold; 2000.

Examples

Run this code

# NOT RUN {
# Estimate power for a trial with 10 clusters in each arm, 20 subjects in
# each cluster, with a probability of 0.8 in the first arm and 0.5 in the
# second arm, with a sigma_b_sq = 1 in the first arm sigma_b_sq = 1.2 in
# the second arm.

# }
# NOT RUN {
binary.sim = cps.binary(nsim = 100, nsubjects = 20,
  nclusters = 10, p1 = 0.8,
  p2 = 0.5, sigma_b_sq = 1,
  sigma_b_sq2 = 1.2, alpha = 0.05,
  method = 'glmm', allSimData = FALSE)
# }
# NOT RUN {
# Estimate power for a trial just as above, except that in the first arm,
# the clusters have 10 subjects in 9 of the 10 clusters and 100 in the tenth
# cluster, while in the second arm all clusters have 20 subjects.

# }
# NOT RUN {
binary.sim2 = cps.binary(nsim = 100,
  nsubjects = c(c(rep(10,9),100), rep(20,10)),
  nclusters = 10, p1 = 0.8,
  p2 = 0.5, sigma_b_sq = 1,
  sigma_b_sq2 = 1.2, alpha = 0.05,
  method = 'gee', allSimData = FALSE)
# }
# NOT RUN {


# }

Run the code above in your browser using DataLab