DirichReg: Fitting a Dirichlet Regression

Description

This function allows for fitting Dirichlet regression models using two different parametrizations.

Usage

DirichReg(formula, data, model = c("common", "alternative"),
          subset, sub.comp, base, weights, control, verbosity = 0)

Arguments

formula

the model formula (for different specifications see Details)

data

a data.frame containing independent and dependent variables

model

specifies whether the "common" ($\alpha\mathrm{s}$) or "alternative" ($\mu/\phi$) parametrization is employed (see Details)

subset

estimates the model for a subset of the data

sub.comp

analyze a subcomposition by selecting specific components (see Details)

base

redefine the base variable

weights

frequency weights

control

a list containing control parameters used for the optimization

verbosity

prints information about the function's progress, see Details

Value

call[language] function call
parametrization[character] used parametrization
varnames[character] components' names
n.vars[numeric] vector with the number of parameters per set of predictors
dims[numeric] number of components
Y[numeric] used components
X[numeric list] sets of predictors
Z[numeric list] sets of predictors (only for the alternative parametrization)
sub.comp[numeric] vector of single components
base[numeric] base (only for the alternative parametrization)
weights[numeric] vector of frequency weights
orig.resp[DirichletRegData] the original response
data[data.frame] original data
d[data.frame] used data
formula[Formula] expanded formula
mf_formula[language] expression for generating the model frame
npar[numeric] number of parameters
coefficients[numeric] named vector of parameters
coefnames[character] names of the parameters
fitted.values[list of matrices] list containing alpha's, mu's, phi's for the observations
logLik[numeric] the log-likelihood
vcov[matrix] covariance-matrix of parameter estimates
hessian[matrix] (observed) Hessian
se[numeric] vector of standard errors
optimization[list] contains details about the optimization process provided by maxBFGS and maxNR

encoding

UTF-8

Details

Formula Specification and Models{ formula determines the used predictors. The responses must be prepared by DR_data and can be optionally stored in the object containing all covariates which is then specified as the argument data. (Although on-the-fly processing of DR_data in a formula works, it is only intended for testing purposes and may be removed at any time -- use at your own risk.) There are two different parametrization (controlled by the argument model, see below):

thecommonparam. that models each$\alpha$by an (possibly individual) set of predictors, and
thealternativeparam. that models expected values ($\mu$; as in multinomial logistic regression) and precision parameters ($\phi$) with two sets of predictors.

As the two models offer different modeling strategies, the specification of their formulae differ: Formulae for the Common Model{ The simplest possible model here is to include only an intercept for all components. If DV is the dependent variable (i.e., compositional data) with three components, we can request this null-model by DV ~ 1. We always have at least two dependent variables, so simple formulae as the one given above will be expanded to DV ~ 1 | 1 | 1, because DV hast three components. Likewise, it is possible to specify a common set of predictors for all components, as in DV ~ p1 * p2, where p1 and p2 are predictors. If the covariates of the components shall differ, one has to set up a complete formula for each subcomposition, using | as separators between the components, for example, DV ~ p1 | p1 + p2 | p1 * p2 will lead to a model where the first response in DV will be modeled using p1, the second will be predicted by p1 + p2 and the third by p1 * p2. Note that if you use the latter approach, the predictors have to be stated explicitly for all response variables. } Formulae for the Alternative Model{ The simplest possible model here is to include an intercept for all components (except the base) and an intercept for precision. This can be achieved by DV ~ 1, which is expanded to DV ~ 1 | 1. The part modeling the mean (first element on the right-hand side) is mandatory, if no specification for precision is included, an intercept will be added. Note that you need to set model = "alternative" to use this parametrization! The alternative parametrization consists of two parts: modeled expected values ($\mu$) and their precision ($\phi$). As in multinomial logistic regression, one response variable is omitted (by default the first, but this can be changed by the base argument in DR_data or DirichReg) and for the rest a set of predictors is used with a multinomial logit-link. For precisions, a different set of predictors can be set up using a log-link. DV ~ p1 * p2 | p1 + p2 will set up a model where the expected values are predicted by p1 * p2 and precision are modeled using p1 + p2. } } Data Preparation{ The data argument accepts a data.frame that must include the dependent variable as a named element (see examples how to do this). } Changing the Base Component and Analyzing Subcompositions{ The base-component (i.e., omitted component) is initially set during the stage of data preparation DR_data, but can easily be changed using the argument base which takes integer values from 1 to the maximum number of components. If a data set contains a large number of components, of which only a few are relevant, the latter can be sorted out and the irrelevant (i.e., not selected) components will be aggregated into a single variable (row sums) that automatically becomes the base category for the model, unless specified otherwise by base. The positioning of variables will necessarily change: the aggregated variable takes the first column and the others are appended in their order of selection. } Subsets and Weights{ Using subset, the model can be fitted only to a part of the data, for more information about this functionality, see subset. Note that, unlike in glm, weights are not treated as prior weights, but as frequency weights! } Optimization and Verbosity{ Using the control argument, the settings passed to the optimizers can be altered. This argument takes a named list. To supply user-defined starting values, use control = list(sv=c(...)) and supply a vector containing initial values for all parameters. Optimizer-specific options include the number of iterations (iterlim = 1000) and convergence criteria for the BFGS- and NR-optimization ((tol1 = 1e-5) and (tol2 = 1e-10)). Verbosity takes integer values from 0 to 4. 0, no information is printed (default). 1 prints information about 3 stages (preparation, starting values, estimation). 2 prints little information about optimization (verbosity values greater than one are passed to print.default = verbosity - 1 of maxBFGS and maxNR). 3 prints more information about optimization. 4 prints all information about optimization. }

Examples

Run this code

ALake <- ArcticLake
ALake$Y <- DR_data(ALake[,1:3])

# fit a quadratic Dirichlet regression models ("common")
res1 <- DirichReg(Y ~ depth + I(depth^2), ALake)

# fit a Dirichlet regression with quadratic predictor for the mean and
# a linear predictor for precision ("alternative")
res2 <- DirichReg(Y ~ depth + I(depth^2) | depth, ALake, model="alternative")

# test both models
anova(res1, res2)

res1
summary(res2)

Run the code above in your browser using DataLab