bcplm: Bayesian Compound Poisson Linear Models

Description

This function fits Tweedie compound Poisson linear models using Markov Chain Monte Carlo methods.

Usage

bcplm(formula, link = "log", data, inits = NULL,
  weights, offset, subset, na.action, contrasts = NULL, 
  n.chains = 3, n.iter = 2000, n.burnin = floor(n.iter / 2),
  n.thin = max(1, floor(n.chains * (n.iter - n.burnin) / n.sims)),
  n.sims = 1000, n.report = 2, prior.beta.mean = NULL, 
  prior.beta.var = NULL, bound.phi = 100, bound.p = c(1.01, 1.99), 
  tune.iter = 5000, n.tune = floor(tune.iter/100),  
  basisGenerators = c("tp", "bsp", "sp2d"), doFit = TRUE, ...)

Value

bcplm returns an object of class "bcplm". See bcplm-class for details of the return values as well as various methods available for this class.

Arguments

formula: an object of class formula. See glm and cpglmm for details.
link: a specification for the model link function. This can be either a literal character string or a numeric number. If it is a character string, it must be one of "log", "identity", "sqrt" or "inverse". If it is numeric, it is the same as the link.power argument in the tweedie function. The default is link = "log".
inits: a list of initial values to be used for each chain. It must be of length n.chains. Each element is a named list with the following components: 'beta' (fixed effects), 'phi' (dispersion), and 'p' (index parameter). If the formula indicates a mixed model, it must also contain two additional members 'u' (random effects) and 'Sigma' (variance components). 'Sigma' must be a list of the same format as the ST slot in cpglmm. If not supplied, the function will generate initial values automatically.
data, subset, weights, na.action, offset, contrasts: further model specification arguments as in cpglm; see there for details.
n.chains: an integer indicating the number of Markov chains (default: 3).
n.iter: the number of total iterations per chain (including burn in; default: 2000)
n.burnin: the length of burn in, i.e. number of iterations to discard at the beginning. Default is n.iter/2, that is, discarding the first half of the simulations.
n.thin: thinning rate. Must be a positive integer. Set n.thin > 1 to save memory and computation time if n.iter is large. Default is max(1, floor(n.chains * (n.iter - n.burnin) / 1000)) which will only thin if there are at least 2000 simulations.
n.sims: The approximate number of simulations to keep after thinning (all chains combined).
n.report: if greater than zero, fitting information will be printed out n.report times for each chain.
prior.beta.mean: a vector of prior means for the fixed effects. Default is a vector of zeros.
prior.beta.var: a vector of prior variances for the fixed effects. Default is a vector of 10000's.
bound.phi: a numeric value indicating the upper bound of the uniform prior for the dispersion parameter. The default is 100. The lower bound is set to be 0 in the function.
bound.p: a vector of lower and upper bounds for the index parameter \(p\). The default is c(1.01, 1.99).
tune.iter: the number of iterations used for tuning the proposal variances used in the Metropolis-Hastings updates. These iterations will not be included in the final output. Default is 5000. Set it to be zero if the tuning process is not desired.
n.tune: a positive integer (default: 20). The tune.iter iterations is divided into n.tune loops. Proposal variances are updated at the end of each loop if acceptance rates are outside the desired interval.
basisGenerators: a character vector of names of functions that generate spline bases. See tp for details.
doFit: if FALSE, the constructed "bcplm_input" object is returned before the model is fitted.
...: not used.

Author

Yanwei (Wayne) Zhang actuary_zhang@hotmail.com

Details

This function provides Markov chain Monte Carlo [MCMC] methods for fitting Tweedie compound Poisson linear models within the Bayesian framework. Both generalized linear models and mixed models can be handled. In computing the posterior distribution, the series evaluation method (see, e.g., dtweedie) is employed to evaluate the compound Poisson density.

In the Bayesian model, prior distributions have to be specified for all parameters in the model. Here, Normal distributions are used for the fixed effects (\(\beta\)), a Uniform distribution for the dispersion parameter (\(\phi\)), a Uniform distribution for the index parameter (\(p\)). If a mixed model is specified, prior distributions must be specified for the variance component. If there is one random effect in a group, the inverse Gamma (scale = 0.001, shape = 0.001) is specified as the prior. If there is more than one random effects in a group, the inverse Wishart (identity matrix as the scale and the dimension of the covariance matrix as the shape) is specified as the prior.

Prior means and variances of the fixed effects can be supplied using the argument prior.beta.mean and prior.beta.var, respectively. The prior distribution of \(\phi\) is uniform on (0, bound.phi). And the bounds of the Uniform for \(p\) can be specified in the argument bound.p. See details in section 'Arguments'.

In implementing the MCMC, a Gibbs sampler is constructed in which parameters are updated one at a time given the current values of all the other parameters. Specifically, we use the random-walk Metropolis-Hastings algorithm in updating each parameter except for the variance components, which can be simulated directly due to conjugacy.

Before the MCMC, there is a tuning process where the proposal variances of the (truncated) Normal proposal distributions are updated according to the sample variances computed from the simulations in each tuning loop. The goal is to make the acceptance rate roughly between 40% and 60% for univariate M-H updates. The argument tune.iter determines how many iterations are used for the tuning process, and n.tune determines how many loops these iterations should be divided into. These iterations will not be used in the final output.

The simulated values of all model parameters are stored in the sims.list slot of the returned bcplm object. It is a list of n.chains matrices and each matrix has approximately n.sims rows. The sims.list slot is further coerced to be of class "mcmc.list" so that various methods from the coda package can be directly applied to get Markov chain diagnostics, posterior summary and plots. See coda for available methods.

References

Zhang, Y (2013). Likelihood-based and Bayesian Methods for Tweedie Compound Poisson Linear Mixed Models, Statistics and Computing, 23, 743-757.

Examples

Run this code


if (FALSE) {

# fit the FineRoot data with Bayesian models
# Bayesian cpglm
set.seed(10)
fit1 <- bcplm(RLD ~ factor(Zone) * factor(Stock), 
            data = FineRoot, tune.iter = 2000, 
            n.iter = 6000, n.burnin = 1000, n.thin = 5)

gelman.diag(fit1$sims.list)
# diagnostic plots                             
acfplot(fit1$sims.list, lag.max = 20)
xyplot(fit1$sims.list)                              
densityplot(fit1$sims.list)               
summary(fit1)
plot(fit1)
                          

# now fit the Bayesian model to an insurance loss triangle 
# (see Peters et al. 2009)
fit2 <- bcplm(increLoss ~ factor(year) + factor(lag), 
            data = ClaimTriangle, n.iter = 12000, 
            n.burnin = 2000, n.thin = 10, bound.p = c(1.1, 1.95))
gelman.diag(fit2$sims.list)                   
summary(fit2)

# mixed models 
set.seed(10)
fit3 <- bcplm(RLD ~ Stock * Zone + (1|Plant), 
            data = FineRoot, n.iter = 15000, 
            n.burnin = 5000, n.thin = 10)
gelman.diag(fit3$sims.list)                   
summary(fit3)



}

Run the code above in your browser using DataLab