Simulate Comparisons For Use in Sequential Markov Longitudinal Clinical Trial Simulations
estSeqMarkovOrd(
y,
times,
initial,
absorb = NULL,
intercepts,
parameter,
looks,
g,
formula,
ppo = NULL,
yprevfactor = TRUE,
groupContrast = NULL,
cscov = FALSE,
timecriterion = NULL,
coxzph = FALSE,
sstat = NULL,
rdsample = NULL,
maxest = NULL,
maxvest = NULL,
nsim = 1,
progress = FALSE,
pfile = ""
)
a data frame with number of rows equal to the product of nsim
, the length of looks
, and the length of parameter
, with variables sim
, parameter
, look
, est
(log odds ratio for group), and vest
(the variance of the latter). If timecriterion
is specified the data frame also contains loghr
(Cox log hazard ratio for group), lrchisq
(chi-square from Cox test for group), and if coxph=TRUE
, phchisq
, the chi-square for testing proportional hazards. The attribute etimefreq
is also present if timecriterion
is present, and it probvides the frequency distribution of derived event times by group and censoring/event indicator. If sstat
is given, the attribute sstat
is also present, and it contains an array with dimensions corresponding to simulations, parameter values within simulations, id
, and a two-column subarray with columns group
and y
, the latter being the summary measure computed by the sstat
function. The returned data frame also has attribute lrmcoef
which are the last-look logistic regression coefficient estimates over the nsim
simulations and the parameter settings, and an attribute failures
which is a data frame containing the variables reason
and frequency
cataloging the reasons for unsuccessful model fits.
vector of possible y values in order (numeric, character, factor)
vector of measurement times
a vector of probabilities summing to 1.0 that specifies the frequency distribution of initial values to be sampled from. The vector must have names that correspond to values of y
representing non-absorbing states.
vector of absorbing states, a subset of y
. The default is no absorbing states. Observations are truncated when an absorbing state is simulated. May be numeric, character, or factor.
vector of intercepts in the proportional odds model. There must be one fewer of these than the length of y
.
vector of true parameter (effects; group differences) values. These are group 2:1 log odds ratios in the transition model, conditioning on the previous y
.
integer vector of ID numbers at which maximum likelihood estimates and their estimated variances are computed. For a single look specify a scalar value for loops
equal to the number of subjects in the sample.
a user-specified function of three or more arguments which in order are yprev
- the value of y
at the previous time, the current time t
, the gap
between the previous time and the current time, an optional (usually named) covariate vector X
, and optional arguments such as a regression coefficient value to simulate from. The function needs to allow yprev
to be a vector and yprev
must not include any absorbing states. The g
function returns the linear predictor for the proportional odds model aside from intercepts
. The returned value must be a matrix with row names taken from yprev
. If the model is a proportional odds model, the returned value must be one column. If it is a partial proportional odds model, the value must have one column for each distinct value of the response variable Y after the first one, with the levels of Y used as optional column names. So columns correspond to intercepts
. The different columns are used for y
-specific contributions to the linear predictor (aside from intercepts
) for a partial or constrained partial proportional odds model. Parameters for partial proportional odds effects may be included in the ... arguments.
a formula object given to the lrm()
function using variables with these name: y
, time
, yprev
, and group
(factor variable having values '1' and '2'). The yprev
variable is converted to a factor before fitting the model unless yprevfactor=FALSE
.
a formula specifying the part of formula
for which proportional odds is not to be assumed, i.e., that specifies a partial proportional odds model. Specifying ppo
triggers the use of VGAM::vglm()
instead of rms::lrm
and will make the simulations run slower.
see formula
omit this argument if group
has only one regression coefficient in formula
. Otherwise if ppo
is omitted, provide groupContrast
as a list of two lists that are passed to rms::contrast.rms()
to compute the contrast of interest and its standard error. The first list corresponds to group 1, the second to group 2, to get a 2:1 contrast. If ppo
is given and the group effect is not just a simple regression coefficient, specify as groupContrast
a function of a vglm
fit that computes the contrast of interest and its standard error and returns a list with elements named Contrast
and SE
. For the latter type you can optionally have formal arguments n1
, n2
, and parameter
that are passed to groupContrast
to compute the standard error of the group contrast, where n1
and n2
respectively are the sample sizes for the two groups and parameter
is the true group effect parameter value.
applies if ppo
is not used. Set to TRUE
to use the cluster sandwich covariance estimator of the variance of the group comparison.
a function of a time-ordered vector of simulated ordinal responses y
that returns a vector FALSE
or TRUE
values denoting whether the current y
level met the condition of interest. For example estSeqMarkovOrd
will compute the first time at which y >= 5
if you specify timecriterion=function(y) y >= 5
. This function is only called at the last data look for each simulated study. To have more control, instead of timecriterion
returning a logical vector have it return a numeric 2-vector containing, in order, the event/censoring time and the 1/0 event/censoring indicator.
set to TRUE
if timecriterion
is specified and you want to compute a statistic for testing proportional hazards at the last look of each simulated data
set to a function of the time vector and the corresponding vector of ordinal responses for a single group if you want to compute a Wilcoxon test on a derived quantity such as the number of days in a given state.
an optional function to do response-dependent sampling. It is a function of these arguments, which are vectors that stop at any absorbing state: times
(ascending measurement times for one subject), y
(vector of ordinal outcomes at these times for one subject. The function returns NULL
if no observations are to be dropped, returns the vector of new times to sample.
maximum acceptable absolute value of the contrast estimate, ignored if NULL
. Any values exceeding maxest
will result in the estimate being set to NA
.
like maxest
but for the estimated variance of the contrast estimate
number of simulations (default is 1)
set to TRUE
to send current iteration number to pfile
every 10 iterations. Each iteration will really involve multiple simulations, if parameter
has length greater than 1.
file to which to write progress information. Defaults to ''
which is the console. Ignored if progress=FALSE
.
Frank Harrell
Simulates sequential clinical trials of longitudinal ordinal outcomes using a first-order Markov model. Looks are done sequentially after subject ID numbers given in the vector looks
with the earliest possible look being after subject 2. At each look, a subject's repeated records are either all used or all ignored depending on the sequent ID number. For each true effect parameter value, simulation, and at each look, runs a function to compute the estimate of the parameter of interest along with its variance. For each simulation, data are first simulated for the last look, and these data are sequentially revealed for earlier looks. The user provides a function g
that has extra arguments specifying the true effect of parameter
the treatment group
expecting treatments to be coded 1 and 2. parameter
is usually on the scale of a regression coefficient, e.g., a log odds ratio. Fitting is done using the rms::lrm()
function, unless non-proportional odds is allowed in which case VGAM::vglm()
is used. If timecriterion
is specified, the function also, for the last data look only, computes the first time at which the criterion is satisfied for the subject or use the event time and event/censoring indicator computed by timecriterion
. The Cox/logrank chi-square statistic for comparing groups on the derived time variable is saved. If coxzph=TRUE
, the survival
package correlation coefficient rho
from the scaled partial residuals is also saved so that the user can later determine to what extent the Markov model resulted in the proportional hazards assumption being violated when analyzing on the time scale. vglm
is accelerated by saving the first successful fit for the largest sample size and using its coefficients as starting value for further vglm
fits for any sample size for the same setting of parameter
.
gbayesSeqSim()
, simMarkovOrd()
, https://hbiostat.org/R/Hmisc/markov/