The main function for running Bayesian circular GLMs. The model predicts some
circular outcome \(\theta\) and has the form $$\theta_i = \beta_0 +
\delta^t d_i + g(\beta^t x_i) + \epsilon_i,$$ where \(\beta_0\) is an
circular intercept, \(\delta\) are group difference parameters, \(d_i\)
is a vector of dummy variables indicating group membership, \(g(.)\) is a
link function given by \(g(x) = r atan(x)\) where r
can be chosen,
\(\beta\) is a vector of regression coefficients, \(x_i\) is a vector of
covariates, and \(\epsilon_i\) is a von Mises distributed error with
residual concentration \(\kappa\). This function returns a circGLM
object which can be further investigated with standard functions plot
,
print
, coef
, residuals
, and special functions
mcmc_summary.circGLM
for results for all MCMC chains,
IC_compare.circGLM
for a comparison of information criteria of one or
more circGLM models, BF.circGLM
to obtain Bayes Factors, and
predict_function.circGLM
to create a prediction function.
circGLM(
formula,
data,
th,
X = if (missing(th)) { model.matrix(formula, data)[, -1, drop = FALSE] } else {
matrix(nrow = length(th), ncol = 0) },
conj_prior = rep(0, 3),
bt_prior_musd = c(mu = 0, sd = 1),
starting_values = c(0, 1, rep(0, ncol(X))),
bwb = rep(0.05, ncol(X)),
Q = 1000,
burnin = 0,
thin = 1,
kappaModeEstBandwith = 0.1,
CIsize = 0.95,
r = 2,
returnPostSample = TRUE,
returnLLEachPar = FALSE,
output = "list",
SDDBFDensEstMethod = "density",
reparametrize = TRUE,
groupMeanComparisons = TRUE,
skipDichSplit = FALSE,
centerOnly = FALSE
)
an optional object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
an optional data frame or object coercible by
as.data.frame
to a data frame, containing the variables in
the model.
An optional vector of angles in radians or degrees, representing
the circular outcome we want to predict. If any value is larger than
2 * pi
, the input is transformed to radians. Otherwise, th
is
treated as radians.
An optional matrix of predictors, both continuous (linear) and categorical (as dummies). If categorical predictors are included, the dummies must already be made and they must be in (0, 1), because this is checked to be able to separate them from the continuous predictors, so that they are treated differently. If not, or if skipDichSplit = TRUE, they will be treated as linear predictors.
A numeric vector of length 3, containing, in that order, prior mean direction, prior resultant length, and prior sample size. Used for the von Mises part of the model, beta_0 and kappa.
A numeric vector of length 2, or NA
. If
bt_prior_musd = NA
, a constant prior is used. If it is a numeric
vector of length 2, a Normal prior is used so that the first value is the
mean, and the second value is the standard deviation.
A numeric vector with starting values for the MCMC sampler. The length of the numeric vector should be 2 plus the number of columns in X.
A numeric vector, where the length is at least the number of
continuous predictors. This is a tuning parameters used in sampling of
beta. New values are sampled uniformly around the current value of beta
with bounds at bt_cur - bwb
and bt_cur + bwb
. If
reparametrize = TRUE
, bwb corresponds to the bounds around the
reparametrized values.
Integer; The number of iterations to perform.
Integer; The number of burn-in (warmup) iterations.
Integer; The number of parameters sets to sample for each
parameter set that is saved. Can be used to save memory if Q
is
large.
Numeric between 0 and 1. The mode of kappa
is estimated by taking the midpoint of a highest density interval.
Specifically, it is the midpoint of the interval that contains
kappaModeEstBandwith
of the density of the posterior. Reasonable
values are roughly between .005 and .2, although lower values may be
reasonable if Q is large.
The size of the credible intervals. This is used for all parameters, whether they use highest density intervals, circular quantiles or regular quantiles.
A numeric. r
is the parameter used in the link function
\(g(x, r) = r atan(x)\). If r = 2
, the link function maps the real
line to the full circle. If r < 2
the link functions maps to a
proportion r / 2
of the circle. If r > 2
, the link functions
can reach the same are of the circle multiple times, which is unlikely to
be useful, and should be used with caution.
Logical indicating whether the MCMC sample itself
should be returned. Should only be set to FALSE
if there are memory
constraints, as many subsequent analyses rely on the posterior sample
directly.
Logical indicating whether to return the likelihood for each observation and each sampled parameter set.
A character string, either "list"
or "vector"
. In
most situations, "list"
should be used, which returns a circGLM
object. The "vector"
options is only useful for simulation studies
etc.
A character string, either "density"
or
"histogram"
. Gives the method to If SDDBFDensEstMethod =
"density"
, the default, the Bayes Factors are computed based on the
density estimate given by a spline interpolation of the density()
function, so they are calculated in R rather than C++. This method should
be much more stable than the histogram method, especially if there is low
probability at 0 in the posterior. If SDDBFDensEstMethod =
"histogram"
, Bayes factors are computed by estimating the density from the
posterior sample as the midpoint of a histogram bar at 0 containing 10% of
the data.
Logical; If TRUE
, proposals for beta are drawn
uniformly around a reparametrization zt = pi * atan(bt) / 2
, so from
zt_can = runif(1, zt - bwb, zt + bwb)
, which is then transformed
back. Then, the proposals amount to the truncated cauchy pdf. If
FALSE
, proposals for beta are drawn on uniformly around beta, so
from bt_can = runif(1, bt_cur - bwb, bt_cur + bwb)
.
Logical indicating whether mean comparisons in the form of Bayes Factors and posterior model probabilities should be computed.
Logical indicating whether to treat categorical
predictor specially. Usually, skipDichSplit = TRUE
should be used.
This removes the arbitrary dependence on the labeling of categorical
predictors and ensures that each group has a regression line of the same
shape. If skipDichSplit = FALSE
, the model will be the same as
lm.circular
from the package circular
in that no separate
treatment for categorical variables is performed.
Logical; If TRUE
, the continuous predictors are
centered only, not standardized. If FALSE
, the continuous predictors
are standardized.
A circGLM
object, which can be further analyzed with its
associated plot.circGLM
, coef.circGLM
and
print.circGLM
functions.
An object of class circGLM
contains the following elements (although
some elements are not returned if not applicable):
b0_meandir
The posterior mean direction of \(\beta_0\), the circular intercept.
b0_CCI
The circular credible interval of of \(\beta_0\), the circular intercept.
kp_mean
The posterior mean of \(\kappa\), the concentration parameter.
kp_mode
The posterior mode of \(\kappa\), the concentration parameter.
kp_HDI
The CIsize
highest posterior
density interval of \(\kappa\).
kp_propacc
The acceptance proportion of the rejection sampler for \(\kappa\).
bt_mean
The posterior means of the regression coefficients \(\beta\).
bt_CCI
The credible intervals of the regression coefficients \(\beta\).
bt_propacc
The acceptance proportions of the Metropolis-Hastings sampler for \(\beta\).
dt_meandir
The posterior mean directions of the group difference parameters, \(\delta\).
dt_CCI
The circular credible intervals of the group difference parameters, \(\delta\).
dt_propacc
The acceptance proportions of the Metropolis-Hastings sampler for \(\delta\).
zt_mean
The posterior means of the reparametrized coefficients \(\zeta\).
zt_mdir
The posterior mean directions of the reparametrized coefficients \(\zeta\).
zt_CCI
The credible intervals of the reparametrized coefficients \(\zeta\).
lppd
Ingredient for information criteria; Log posterior predictive density.
n_par
Ingredient for information criteria; Number of parameters.
ll_th_estpars
Ingredient for information criteria; Log-likelihood of the dataset at estimated parameter set.
ll_each_th_curpars
Ingredient for information criteria; Log-likelihood of each data point at each sampled parameter set.
ll_th_curpars
Ingredient for information criteria; Log-likelihood of the dataset at each sampled parameter set.
th_hat
An n-vector of predicted angles.
b0_chain
A Q-vector of sampled circular intercepts.
kp_chain
A Q-vector of sampled concentration parameters.
bt_chain
A matrix of sampled circular regression coefficients.
dt_chain
A matrix of sampled group difference parameters.
zt_chain
A matrix of sampled reparametrized circular regression coefficients.
mu_chain
A matrix of sampled group means.
AIC_Bayes
A version of the AIC where posterior estimates are used to compute the log-likelihood.
p_DIC
Ingredient for DIC.
p_DIC_alt
Ingredient for DIC.
DIC
The DIC.
DIC_alt
The alternative formulation of the DIC as given in Bayesian Data Analysis, Gelman et al. (2003).
p_WAIC1
Ingredient for WAIC1.
p_WAIC2
Ingredient for WAIC2.
WAIC1
The first formulation of the WAIC as given in Bayesian Data Analysis, Gelman et al. (2003).
WAIC2
The second formulation of the WAIC as given in Bayesian Data Analysis, Gelman et al. (2003).
DeltaIneqBayesFactors
A matrix of inequality Bayes factors for group difference parameters.
BetaIneqBayesFactors
A matrix of inequality Bayes factors for regression parameters.
BetaSDDBayesFactors
A matrix of equality Bayes factors (Savage-Dickey Density ratio) for group difference parameters.
MuIneqBayesFactors
A matrix of inequality Bayes factors for group mean parameters.
MuSDDBayesFactors
A matrix of equality Bayes factors (Savage-Dickey Density ratio) for group mean parameters.
SavedIts
Number of iterations returned, without thinned iterations and burn-in.
TotalIts
Number of iterations performed, including thinning and burn-in.
TimeTaken
Seconds taken for analysis.
BetaBayesFactors
Matrix of Bayes factors for regression parameters.
MuBayesFactors
Matrix of Bayes factors for mean parameters.
all_chains
A matrix with all sampled values of all parameters.
Call
The matched call.
thin
Thinning factor used.
burnin
Burn-in used.
data_th
The original dataset.
data_X
Matrix of used continuous predictors.
data_d
Matrix of used categorical predictors.
data_stX
Matrix of used standardized categorical predictors.
r
Used parameter of the link function.
The model can be passed either as a combination of a formula
and a
data frame or matrix data
, as in lm()
, or as an outcome vector
th
and a matrix of predictors X
. If categorical variables are
to be included that are not yet given as dummies, formula syntax is
recommended as this will automatically take care of dummy creation.
circGLM
performs an MCMC sampler that generates a sample from the
posterior of the intercept \(\beta_0\), regression coefficients
\(\beta\), group mean direction differences \(\delta\) and residual
\(\kappa\).
An attempt is made to split the predictor matrix X
into continuous and
categorical predictors. This is done so that the categorical predictors can
be treated differently, which removes the arbitrary dependence on the
labeling of categorical predictors and ensures that each group has a
regression line of the same shape.
If categorical predictors are passed as factors, formula syntax is
recommended, as it will automatically generate dummy variables. If the
predictors are passed as a matrix X
, categorical variables must be
entered as dummy (dichotomous) variables.
The main results obtained are estimates and credible intervals for the parameters, posterior samples, and Bayes factors for various standard hypothesis comparisons.
As with all MCMC samplers, convergence must be checked, and tuning parameters
bwb
and reparametrize
can be tweaked if the sampler converges
poorly. The circGLM object that is returned contains proportions accepted
which can be used to monitor performance.
print.circGLM
, plot.circGLM
,
coef.circGLM
, BF.circGLM
,
residuals.circGLM
, predict.circGLM
,
predict_function.circGLM
, mcmc_summary.circGLM
,
IC_compare.circGLM
.
# NOT RUN {
dat <- generateCircGLMData()
m <- circGLM(th ~ ., dat)
print(m)
print(m, type = "all")
plot(m, type = "tracestack")
# }
Run the code above in your browser using DataLab