This function utilizes iterative simulations to determine approximate power for cluster-randomized controlled trials. Users can modify a variety of parameters to suit the simulations to their desired experimental situation.
Runs power simulations for difference in difference cluster randomized control trials using count outcomes
Users must specify the desired number of simulations, number of subjects per cluster, number of clusters per arm, between-cluster variance, two of the following: expected count in arm 1, expected count in arm 2, difference in counts between groups; significance level, analytic method, and whether or not progress updates should be displayed while the function is running.
cps.did.count(
nsim = NULL,
nsubjects = NULL,
nclusters = NULL,
c1t0 = 0,
c2t0 = NULL,
c1t1 = NULL,
c2t1 = NULL,
c.diff = NULL,
sigma_b_sq0 = NULL,
sigma_b_sq1 = 0,
family = "poisson",
analysis = "poisson",
negBinomSize = 1,
method = "glmm",
alpha = 0.05,
quiet = FALSE,
allSimData = FALSE,
poorFitOverride = FALSE,
lowPowerOverride = FALSE,
timelimitOverride = TRUE,
seed = NA,
nofit = FALSE
)
Number of datasets to simulate; accepts integer (required).
Number of subjects per cluster; accepts integer (required).
Number of clusters per arm; accepts integer (required). At least 2 of the following 3 arguments must be specified:
Required. Expected outcome count in arm 1 at baseline. Default is 0.
Optional. Expected outcome count in arm 2 at baseline. If no quantity is provided, c2t0 = c1t0 is assumed.
Optional. Expected outcome count in arm 1 at follow-up. If no quantity is provided, c1t1 = c1t0 is assumed.
Required. Expected outcome count in arm 2 at follow-up.
Optional if c1t1 and c2t0 are provided. Expected difference in outcome count between groups, defined as c.diff = (c1t1 - c1t0) - (c2t1 - c2t0).
Pre-treatment (time == 0) between-cluster variance; accepts numeric scalar (indicating equal between-cluster variances for both arm) or a vector of length 2 specifying treatment-specific between-cluster variances
Post-treatment (time == 1) between-cluster variance; accepts numeric scalar (indicating equal between-cluster variances for both arm) or a vector of length 2 specifying treatment-specific between-cluster variances. For data simulation, sigma_b_sq1 is added to sigma_b_sq0, such that if sigma_b_sq0 = 5 and sigma_b_sq1 = 2, the between-cluster variance at time == 1 equals 7. Default = 0.
Distribution from which responses are simulated. Accepts Poisson ('poisson') or negative binomial ('neg.binom') (required); default = 'poisson'
Family used for regression; currently only applicable for GLMM. Accepts c('poisson', 'neg.binom') (required); default = 'poisson'
Only used when generating simulated data from the negative binomial (family = 'neg.binom'), this is the target for number of successful trials, or the dispersion parameter (the shape parameter of the gamma mixing distribution). Must be strictly positive but need not be integer. Defaults to 1.
Analytical method, either Generalized Linear Mixed Effects Model (GLMM) or Generalized Estimating Equation (GEE). Accepts c('glmm', 'gee') (required); default = 'glmm'
Significance level for power estimation, accepts value between 0 - 1; default = 0.05
When set to FALSE, displays simulation progress and estimated completion time. Default = FALSE.
Option to output list of all simulated datasets. Default = FALSE
Option to override stop()
if more than 25%
of fits fail to converge; default = FALSE.
Option to override stop()
if the power
is less than 0.5 after the first 50 simulations and every ten simulations
thereafter. On function execution stop, the actual power is printed in the
stop message. Default = FALSE. When TRUE, this check is ignored and the
calculated power is returned regardless of value.
Logical. When FALSE, stops execution if the estimated completion time is more than 2 minutes. Defaults to TRUE.
Option to set the seed. Default is NA.
Option to skip model fitting and analysis and only return the
simulated data.
Default = FALSE
.
A list with the following components:
Character string indicating total number of simulations, distribution of simulated data, and regression family
Number of simulations
Data frame with columns 'Power' (Estimated statistical power), 'lower.95.ci' (Lower 95 'upper.95.ci' (Upper 95
Analytic method used for power estimation
Data frame containing families for distribution and analysis of simulated data
Significance level
Vector containing user-defined cluster sizes
Vector containing user-defined number of clusters
Data frame reporting between-cluster variances at each time point for each arm
Vector containing expected counts and risk ratios based on user inputs
Data frame with columns: 'Period' (Pre/Post-treatment indicator), 'Arm.2' (Arm indicator), 'Value' (Mean response value)
Data frame with columns: 'Estimate' (Estimate of treatment effect for a given simulation), 'Std.Err' (Standard error for treatment effect estimate), 'Test.statistic' (z-value (for GLMM) or Wald statistic (for GEE)), 'p.value', 'converge' (Did simulated model converge?), 'sig.val' (Is p-value less than alpha?)
If allSimData = TRUE
, a list of data frames, each containing:
'y' (Simulated response value),
'trt' (Indicator for arm),
'clust' (Indicator for cluster),
'period' (Indicator for time point)
If nofit = T
, a data frame of the simulated data sets, containing:
"arm" (Indicator for treatment arm)
"cluster" (Indicator for cluster)
"y1" ... "yn" (Simulated response value for each of the nsim
data sets).
Snjiders, T. & Bosker, R. Multilevel Analysis: an Introduction to Basic and Advanced Multilevel Modelling. London, 1999: Sage.
Elridge, S., Ukoumunne, O. & Carlin, J. The Intra-Cluster Correlation Coefficient in Cluster Randomized Trials: A Review of Definitions. International Statistical Review (2009), 77, 3, 378-394. doi: 10.1111/j.1751-5823.2009.00092.x
# NOT RUN {
# Estimate power for a trial with 7 clusters in both arms, those clusters having
# 9 subjects each, with sigma_b_sq0 = 0.1 in the first arm and 0.5 in the second arm.
# We have estimated arm counts of 5 and 3 in the first and second arms, respectively,
# and we use 100 simulated data sets analyzed by the GLMM method. The resulting
# estimated power (if you set seed = 123) should be 0.86.
# }
# NOT RUN {
did.count.sim = cps.did.count(nsim = 100, nsubjects = 9, nclusters = 7,
c1t0 = 5, c1t1 = 5, c2t0 = 5, c2t1 = 8,
sigma_b_sq0 = c(1, 0.5), sigma_b_sq1 = c(0.5, 0.8),
family = 'poisson', analysis = 'poisson',
method = 'glmm', seed = 123)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab