glmmGS: Fit Generalized Linear Mixed Models

Description

glmmGS is used to fit generalized mixed linear models specified by a string-like formula. The MLE estimation of the parameters is performed by using a Gauss-Seidel algorithm to fit the regression coefficients (see Reference), and a Penalized Quasi Likelihood (PQL) approach to fit the covariance components (see Note). The algorithm is highly optimized to fit fixed effect models to data-sets with a large number of observations and with stratified covariates with a large number of levels, and mixed models with stratified covariates with a large number of levels and diagonal covariance structures. Covariance structures defined by non diagonal dense or sparse precision matrices are also allowed (see glmmGS.CovarianceModel).

Usage

glmmGS(formula, family, data = NULL, covariance.models = NULL, control = glmmGS.Control())

Arguments

formula

a formula object providing a symbolic description of the model to be fitted. The formula object must be of the type response ~ linear-predictor. For a binomial response, the response must be defined as (outcome | counts). The l

family

a description of the error distribution. Only two families are currently allowed: binomial and poisson with a canonical link.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typica

covariance.models

an optional list or environment containing the covariance models referenced in the formula. If not found in covariance.models, the covariance models are taken from environment(formula), typically the environment from whic

control

a list of parameters for controlling the fitting process. The list is returned by the glmmGS.Control function.

Value

In the current implementation (see Note), the function returns a list of values comprised of:
fixed.effectsa list containing the items: estimates: the estimates of the fixed effect coefficients; standard.errors: the standard errors of the fixed effect coefficient estimates (see Note).
random.effectsa list containing the items: estimates: the estimates of the random effect coefficients; standard.errors: the standard errors of the random effect coefficient estimates (see Note).
covariance.componentsa list containing the items: estimates: the estimates of the covariance components; standard.errors: the standard errors of the covariance components.
iterationsthe number of iterations.

Details

If a fixed intercept is specified (either a global intercept or a stratified intercept), then each stratified random intercept is constrained to have a zero mean. If both global and stratified fixed intercepts are specified, then their fitted values are determined up to a random constant. This is because the Gauss-Seidel algorithm displays best convergence properties by updating each fixed intercept component-wise without imposing any constraints, such as setting the value of the intercept associated with the first level of a factor equal to zero. (Notice that the same does not hold for random intercepts, where removing the overall mean is necessary to boost converge.) A simple post-processing of the fitted value can be used to adjust the offset of the estimated fixed intercepts.

References

Guha, S., Ryan, L., Morara M. (2009) Gauss-Seidel Estimation of Generalized Linear Mixed Models With Application to Poisson Modeling of Spatially Varying Disease Rates. Journal of Computational and Graphical Statistics, 4, 810--837.

Examples

Run this code

# Set data dimension: 
	# - 1 million observations, 
	# - 7 'dense' fixed effect covariates,
	# - stratified random effects with 2000 strata (levels)
	nobs <- 1000000;
	ndays <- 7;
	nPostCodes <- 2000;

	# Generate data
	counts <- as.integer(runif(nobs, 1, 10));
	days <- matrix(rnorm(nobs * ndays), nrow = nobs, ncol = ndays);
	seifa <- rnorm(nobs);
	postCodes <- as.factor(as.integer(runif(nobs, 0, nPostCodes)));
	ipostCodes <- as.integer(postCodes) - 1L; # zero-based vector of indices
	
	# Generate coefficients
	offset = runif(nobs, -1, 1);
	intercept = -0.5;
	sd.days <- 0.1;
	beta.days <- rnorm(ndays, sd = sd.days);
	sd.seifa <- 0.1;
	intercept.seifa <- rnorm(nPostCodes, sd = sd.seifa);
	beta.seifa <- rnorm(nPostCodes, sd = sd.seifa);
	
	# Generate linear predictor
	eta <- intercept + days %*% beta.days;
	eta <- eta + intercept.seifa[as.integer(postCodes)] + seifa * beta.seifa[as.integer(postCodes)];

	# Generate response
	y <- as.integer(rbinom(nobs, counts, plogis(eta)));

	# Define identity precision model and control list
	I <- glmmGS.CovarianceModel("identity");
	control <- glmmGS.Control(reltol = 1.e-8, abstol = 1.e-25, maxit = 200);
	
	# Fit model using Gauss-Seidel algorithm with two blocks: 
	# - one fixed effect block: (1 + days);
	# - one stratified random effect block with an identity  
	#   covariance model: (1 + seifa | ipostCodes ~ I).
	formula = (y | counts) ~ offset(offset) + (1 + days) + (1 + seifa | ipostCodes ~ I);
	results <- glmmGS(formula = formula, family = binomial, control = control);

Run the code above in your browser using DataLab