A model for case-control studies with optional prior distributions for the coefficients, intercept, and auxiliary parameters.
stan_clogit(
formula,
data,
subset,
na.action = NULL,
...,
strata,
prior = normal(autoscale = TRUE),
prior_covariance = decov(),
prior_PD = FALSE,
algorithm = c("sampling", "optimizing", "meanfield", "fullrank"),
adapt_delta = NULL,
QR = FALSE,
sparse = FALSE
)
Same as for glmer
,
except that any global intercept included in the formula will be dropped.
We strongly advise against omitting the data
argument. Unless
data
is specified (and is a data frame) many post-estimation
functions (including update
, loo
, kfold
) are not
guaranteed to work properly.
Further arguments passed to the function in the rstan
package (sampling
,
vb
, or
optimizing
),
corresponding to the estimation method named by algorithm
. For example,
if algorithm
is "sampling"
it is possibly to specify iter
,
chains
, cores
, refresh
, etc.
A factor indicating the groups in the data where the number of
successes (possibly one) is fixed by the research design. It may be useful
to use interaction
or strata
to
create this factor. However, the strata
argument must not rely on
any object besides the data
data.frame
.
The prior distribution for the (non-hierarchical) regression coefficients.
The default priors are described in the vignette
Prior
Distributions for rstanarm Models.
If not using the default, prior
should be a call to one of the
various functions provided by rstanarm for specifying priors. The
subset of these functions that can be used for the prior on the
coefficients can be grouped into several "families":
Family | Functions |
Student t family | normal , student_t , cauchy |
Hierarchical shrinkage family | hs , hs_plus |
Laplace family | laplace , lasso |
Product normal family | product_normal |
See the priors help page for details on the families and
how to specify the arguments for all of the functions in the table above.
To omit a prior ---i.e., to use a flat (improper) uniform prior---
prior
can be set to NULL
, although this is rarely a good
idea.
Note: Unless QR=TRUE
, if prior
is from the Student t
family or Laplace family, and if the autoscale
argument to the
function used to specify the prior (e.g. normal
) is left at
its default and recommended value of TRUE
, then the default or
user-specified prior scale(s) may be adjusted internally based on the
scales of the predictors. See the priors help page and the
Prior Distributions vignette for details on the rescaling and the
prior_summary
function for a summary of the priors used for a
particular model.
Cannot be NULL
when lme4-style group-specific
terms are included in the formula
. See decov
for
more information about the default arguments. Ignored when there are no
group-specific terms.
A logical scalar (defaulting to FALSE
) indicating
whether to draw from the prior predictive distribution instead of
conditioning on the outcome.
A string (possibly abbreviated) indicating the
estimation approach to use. Can be "sampling"
for MCMC (the
default), "optimizing"
for optimization, "meanfield"
for
variational inference with independent normal distributions, or
"fullrank"
for variational inference with a multivariate normal
distribution. See rstanarm-package
for more details on the
estimation algorithms. NOTE: not all fitting functions support all four
algorithms.
Only relevant if algorithm="sampling"
. See
the adapt_delta help page for details.
A logical scalar defaulting to FALSE
, but if TRUE
applies a scaled qr
decomposition to the design matrix. The
transformation does not change the likelihood of the data but is
recommended for computational reasons when there are multiple predictors.
See the QR-argument documentation page for details on how
rstanarm does the transformation and important information about how
to interpret the prior distributions of the model parameters when using
QR=TRUE
.
A logical scalar (defaulting to FALSE
) indicating
whether to use a sparse representation of the design (X) matrix.
If TRUE
, the the design matrix is not centered (since that would
destroy the sparsity) and likewise it is not possible to specify both
QR = TRUE
and sparse = TRUE
. Depending on how many zeros
there are in the design matrix, setting sparse = TRUE
may make
the code run faster and can consume much less RAM.
A stanreg object is returned
for stan_clogit
.
The stan_clogit
function is mostly similar in syntax to
clogit
but rather than performing maximum
likelihood estimation of generalized linear models, full Bayesian
estimation is performed (if algorithm
is "sampling"
) via
MCMC. The Bayesian model adds priors (independent by default) on the
coefficients of the GLM.
The data.frame
passed to the data
argument must be sorted by
the variable passed to the strata
argument.
The formula
may have group-specific terms like in
stan_glmer
but should not allow the intercept to vary by the
stratifying variable, since there is no information in the data with which
to estimate such deviations in the intercept.
stanreg-methods
and
clogit
.
The vignette for Bernoulli and binomial models, which has more
details on using stan_clogit
.
http://mc-stan.org/rstanarm/articles/
# NOT RUN {
dat <- infert[order(infert$stratum), ] # order by strata
post <- stan_clogit(case ~ spontaneous + induced + (1 | education),
strata = stratum,
data = dat,
subset = parity <= 2,
QR = TRUE,
chains = 2, iter = 500) # for speed only
nd <- dat[dat$parity > 2, c("case", "spontaneous", "induced", "education", "stratum")]
# next line would fail without case and stratum variables
pr <- posterior_epred(post, newdata = nd) # get predicted probabilities
# not a random variable b/c probabilities add to 1 within strata
all.equal(rep(sum(nd$case), nrow(pr)), rowSums(pr))
# }
Run the code above in your browser using DataLab