circGLM: Fitting Bayesian circular General Linear Models

Description

The main function for running Bayesian circular GLMs. The model predicts some circular outcome $\theta$ and has the form $$\theta_i = \beta_0 + \delta^t d_i + g(\beta^t x_i) + \epsilon_i,$$ where $\beta_0$ is an circular intercept, $\delta$ are group difference parameters, $d_i$ is a vector of dummy variables indicating group membership, $g(.)$ is a link function given by $g(x) = r atan(x)$ where r can be chosen, $\beta$ is a vector of regression coefficients, $x_i$ is a vector of covariates, and $\epsilon_i$ is a von Mises distributed error with residual concentration $\kappa$. This function returns a circGLM object which can be further investigated with standard functions plot, print, coef, residuals, and special functions mcmc_summary.circGLM for results for all MCMC chains, IC_compare.circGLM for a comparison of information criteria of one or more circGLM models, BF.circGLM to obtain Bayes Factors, and predict_function.circGLM to create a prediction function.

Usage

circGLM(
  formula,
  data,
  th,
  X = if (missing(th)) {     model.matrix(formula, data)[, -1, drop = FALSE] } else {  
      matrix(nrow = length(th), ncol = 0) },
  conj_prior = rep(0, 3),
  bt_prior_musd = c(mu = 0, sd = 1),
  starting_values = c(0, 1, rep(0, ncol(X))),
  bwb = rep(0.05, ncol(X)),
  Q = 1000,
  burnin = 0,
  thin = 1,
  kappaModeEstBandwith = 0.1,
  CIsize = 0.95,
  r = 2,
  returnPostSample = TRUE,
  returnLLEachPar = FALSE,
  output = "list",
  SDDBFDensEstMethod = "density",
  reparametrize = TRUE,
  groupMeanComparisons = TRUE,
  skipDichSplit = FALSE,
  centerOnly = FALSE
)

Arguments

formula

an optional object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.

data

an optional data frame or object coercible by as.data.frame to a data frame, containing the variables in the model.

An optional vector of angles in radians or degrees, representing the circular outcome we want to predict. If any value is larger than 2 * pi, the input is transformed to radians. Otherwise, th is treated as radians.

An optional matrix of predictors, both continuous (linear) and categorical (as dummies). If categorical predictors are included, the dummies must already be made and they must be in (0, 1), because this is checked to be able to separate them from the continuous predictors, so that they are treated differently. If not, or if skipDichSplit = TRUE, they will be treated as linear predictors.

conj_prior

A numeric vector of length 3, containing, in that order, prior mean direction, prior resultant length, and prior sample size. Used for the von Mises part of the model, beta_0 and kappa.

bt_prior_musd

A numeric vector of length 2, or NA. If bt_prior_musd = NA, a constant prior is used. If it is a numeric vector of length 2, a Normal prior is used so that the first value is the mean, and the second value is the standard deviation.

starting_values

A numeric vector with starting values for the MCMC sampler. The length of the numeric vector should be 2 plus the number of columns in X.

bwb

A numeric vector, where the length is at least the number of continuous predictors. This is a tuning parameters used in sampling of beta. New values are sampled uniformly around the current value of beta with bounds at bt_cur - bwb and bt_cur + bwb. If reparametrize = TRUE, bwb corresponds to the bounds around the reparametrized values.

Integer; The number of iterations to perform.

burnin

Integer; The number of burn-in (warmup) iterations.

thin

Integer; The number of parameters sets to sample for each parameter set that is saved. Can be used to save memory if Q is large.

kappaModeEstBandwith

Numeric between 0 and 1. The mode of kappa is estimated by taking the midpoint of a highest density interval. Specifically, it is the midpoint of the interval that contains kappaModeEstBandwith of the density of the posterior. Reasonable values are roughly between .005 and .2, although lower values may be reasonable if Q is large.

CIsize

The size of the credible intervals. This is used for all parameters, whether they use highest density intervals, circular quantiles or regular quantiles.

A numeric. r is the parameter used in the link function $g(x, r) = r atan(x)$. If r = 2, the link function maps the real line to the full circle. If r < 2 the link functions maps to a proportion r / 2 of the circle. If r > 2, the link functions can reach the same are of the circle multiple times, which is unlikely to be useful, and should be used with caution.

returnPostSample

Logical indicating whether the MCMC sample itself should be returned. Should only be set to FALSE if there are memory constraints, as many subsequent analyses rely on the posterior sample directly.

returnLLEachPar

Logical indicating whether to return the likelihood for each observation and each sampled parameter set.

output

A character string, either "list" or "vector". In most situations, "list" should be used, which returns a circGLM object. The "vector" options is only useful for simulation studies etc.

SDDBFDensEstMethod

A character string, either "density" or "histogram". Gives the method to If SDDBFDensEstMethod = "density", the default, the Bayes Factors are computed based on the density estimate given by a spline interpolation of the density() function, so they are calculated in R rather than C++. This method should be much more stable than the histogram method, especially if there is low probability at 0 in the posterior. If SDDBFDensEstMethod = "histogram", Bayes factors are computed by estimating the density from the posterior sample as the midpoint of a histogram bar at 0 containing 10% of the data.

reparametrize

Logical; If TRUE, proposals for beta are drawn uniformly around a reparametrization zt = pi * atan(bt) / 2, so from zt_can = runif(1, zt - bwb, zt + bwb), which is then transformed back. Then, the proposals amount to the truncated cauchy pdf. If FALSE, proposals for beta are drawn on uniformly around beta, so from bt_can = runif(1, bt_cur - bwb, bt_cur + bwb).

groupMeanComparisons

Logical indicating whether mean comparisons in the form of Bayes Factors and posterior model probabilities should be computed.

skipDichSplit

Logical indicating whether to treat categorical predictor specially. Usually, skipDichSplit = TRUE should be used. This removes the arbitrary dependence on the labeling of categorical predictors and ensures that each group has a regression line of the same shape. If skipDichSplit = FALSE, the model will be the same as lm.circular from the package circular in that no separate treatment for categorical variables is performed.

centerOnly

Logical; If TRUE, the continuous predictors are centered only, not standardized. If FALSE, the continuous predictors are standardized.

Value

A circGLM object, which can be further analyzed with its associated plot.circGLM, coef.circGLM and print.circGLM functions.

An object of class circGLM contains the following elements (although some elements are not returned if not applicable):

b0_meandir: The posterior mean direction of $\beta_0$, the circular intercept.
b0_CCI: The circular credible interval of of $\beta_0$, the circular intercept.
kp_mean: The posterior mean of $\kappa$, the concentration parameter.
kp_mode: The posterior mode of $\kappa$, the concentration parameter.
kp_HDI: The CIsize highest posterior density interval of $\kappa$.
kp_propacc: The acceptance proportion of the rejection sampler for $\kappa$.
bt_mean: The posterior means of the regression coefficients $\beta$.
bt_CCI: The credible intervals of the regression coefficients $\beta$.
bt_propacc: The acceptance proportions of the Metropolis-Hastings sampler for $\beta$.
dt_meandir: The posterior mean directions of the group difference parameters, $\delta$.
dt_CCI: The circular credible intervals of the group difference parameters, $\delta$.
dt_propacc: The acceptance proportions of the Metropolis-Hastings sampler for $\delta$.
zt_mean: The posterior means of the reparametrized coefficients $\zeta$.
zt_mdir: The posterior mean directions of the reparametrized coefficients $\zeta$.
zt_CCI: The credible intervals of the reparametrized coefficients $\zeta$.
lppd: Ingredient for information criteria; Log posterior predictive density.
n_par: Ingredient for information criteria; Number of parameters.
ll_th_estpars: Ingredient for information criteria; Log-likelihood of the dataset at estimated parameter set.
ll_each_th_curpars: Ingredient for information criteria; Log-likelihood of each data point at each sampled parameter set.
ll_th_curpars: Ingredient for information criteria; Log-likelihood of the dataset at each sampled parameter set.
th_hat: An n-vector of predicted angles.
b0_chain: A Q-vector of sampled circular intercepts.
kp_chain: A Q-vector of sampled concentration parameters.
bt_chain: A matrix of sampled circular regression coefficients.
dt_chain: A matrix of sampled group difference parameters.
zt_chain: A matrix of sampled reparametrized circular regression coefficients.
mu_chain: A matrix of sampled group means.
AIC_Bayes: A version of the AIC where posterior estimates are used to compute the log-likelihood.
p_DIC: Ingredient for DIC.
p_DIC_alt: Ingredient for DIC.
DIC: The DIC.
DIC_alt: The alternative formulation of the DIC as given in Bayesian Data Analysis, Gelman et al. (2003).
p_WAIC1: Ingredient for WAIC1.
p_WAIC2: Ingredient for WAIC2.
WAIC1: The first formulation of the WAIC as given in Bayesian Data Analysis, Gelman et al. (2003).
WAIC2: The second formulation of the WAIC as given in Bayesian Data Analysis, Gelman et al. (2003).
DeltaIneqBayesFactors: A matrix of inequality Bayes factors for group difference parameters.
BetaIneqBayesFactors: A matrix of inequality Bayes factors for regression parameters.
BetaSDDBayesFactors: A matrix of equality Bayes factors (Savage-Dickey Density ratio) for group difference parameters.
MuIneqBayesFactors: A matrix of inequality Bayes factors for group mean parameters.
MuSDDBayesFactors: A matrix of equality Bayes factors (Savage-Dickey Density ratio) for group mean parameters.
SavedIts: Number of iterations returned, without thinned iterations and burn-in.
TotalIts: Number of iterations performed, including thinning and burn-in.
TimeTaken: Seconds taken for analysis.
BetaBayesFactors: Matrix of Bayes factors for regression parameters.
MuBayesFactors: Matrix of Bayes factors for mean parameters.
all_chains: A matrix with all sampled values of all parameters.
Call: The matched call.
thin: Thinning factor used.
burnin: Burn-in used.
data_th: The original dataset.
data_X: Matrix of used continuous predictors.
data_d: Matrix of used categorical predictors.
data_stX: Matrix of used standardized categorical predictors.
r: Used parameter of the link function.

Details

The model can be passed either as a combination of a formula and a data frame or matrix data, as in lm(), or as an outcome vector th and a matrix of predictors X. If categorical variables are to be included that are not yet given as dummies, formula syntax is recommended as this will automatically take care of dummy creation.

circGLM performs an MCMC sampler that generates a sample from the posterior of the intercept $\beta_0$, regression coefficients $\beta$, group mean direction differences $\delta$ and residual $\kappa$.

An attempt is made to split the predictor matrix X into continuous and categorical predictors. This is done so that the categorical predictors can be treated differently, which removes the arbitrary dependence on the labeling of categorical predictors and ensures that each group has a regression line of the same shape.

If categorical predictors are passed as factors, formula syntax is recommended, as it will automatically generate dummy variables. If the predictors are passed as a matrix X, categorical variables must be entered as dummy (dichotomous) variables.

The main results obtained are estimates and credible intervals for the parameters, posterior samples, and Bayes factors for various standard hypothesis comparisons.

As with all MCMC samplers, convergence must be checked, and tuning parameters bwb and reparametrize can be tweaked if the sampler converges poorly. The circGLM object that is returned contains proportions accepted which can be used to monitor performance.

Examples

Run this code

# NOT RUN {
dat <- generateCircGLMData()
m   <- circGLM(th ~ ., dat)
print(m)
print(m, type = "all")
plot(m, type = "tracestack")
# }

Run the code above in your browser using DataLab