HLfit: Fit mixed models with given correlation matrix

Description

This function fits GLMMs as well as some hierarchical generalized linear models (HGLM; Lee and Nelder 2001). HLfit fits both fixed effects parameters, and dispersion parameters i.e. the variance of the random effects (full covariance for random-coefficient models), and the variance of the residual error. The linear predictor is of the standard form offset+ X beta + Z b, where X is the design matrix of fixed effects and Z is a design matrix of random effects (typically an incidence matrix with 0s and 1s, but not necessarily). Models are fitted by an iterative algorithm alternating estimation of fixed effects and of dispersion parameters. The residual dispersion may follow a “structured-dispersion model” modeling heteroscedasticity. Estimation of the latter parameters is performed by a form of fit of debiased residuals, which allows fitting a structured-dispersion model (Smyth et al. 2001). However, evaluation of the debiased residuals can be slow in particular for large datasets. For models without structured dispersion, it is then worth using the fitme function (or the corrHLfit function with non-default arguments). These functions can optimize the likelihood of HLfit fits for different given values of the dispersion parameters (“outer optimization”), thereby avoiding the need to estimate debiased residuals.

Usage

HLfit(formula, data, family = gaussian(), rand.family = gaussian(), 
      resid.model = ~1, REMLformula = NULL, 
      verbose = c(trace = FALSE), HLmethod = "HL(1,1)", control.HLfit = list(), 
      control.glm = list(), init.HLfit = list(), ranFix = list(), 
      etaFix = list(), prior.weights = NULL, processed = NULL)
## see 'rand.family' argument for inverse.Gamma

Arguments

formula

A formula; or a predictor, i.e. a formula with attributes created by Predictor, if design matrices for random effects have to be provided. See Details in spaMM for allowed terms in the formula (except spatial ones).

data

A data frame containing the variables named in the model formula.

family

A family object describing the distribution of the response variable. See Details in spaMM for handled families.

rand.family

A family object describing the distribution of the random effect, or a list of family objects for different random effects (see Examples). Possible options are gaussian(), Gamma(log), Gamma(identity) (see Details), Beta(logit), inverse.Gamma(-1/mu), and inverse.Gamma(log). For discussion of these alternatives see Lee and Nelder 2001 or Lee et al. 2006, p. 178-. Here the family gives the distribution of a random effect \(u\) and the link gives v as function of \(u\) (see Details). If there are several random effects and only one family is given, this family holds for all random effects.

resid.model

Either a formula (without left-hand side) for the dispersion parameter phi of the residual error. A log link is assumed by default; or a list, with at most three possible elements if its formula involves only fixed effects:

formula: model formula as in formula-only case, without left-hand side
family: Always Gamma, with by default a log link. Gamma(identity) can be tried but may fail because only the log link ensures that the fitted \(\phi\) is positive.
fixed: can be used to specify the residual dispersion parameter of the residual dispersion model itself. The default value is 1; this argument can be used to set another value, and fixed=list(phi=NA) will force estimation of this parameter.

and additional possible elements (all named as fitme arguments) if its formula involves random effects: see phiHGLM.

REMLformula

A model formula that allows the estimation of dispersion parameters, and computation of restricted likelihood (p_bv) under a model different from the predictor formula.

For example, if only random effects are included in REMLformula, an ML fit is performed and p_bv equals the marginal likelihood (or its approximation), p_v. This ML fit can be performed more simply by setting HLmethod="ML" and leaving REMLformula at its default NULL value.

verbose

A vector of booleans. trace controls various diagnostic messages (possibly messy) about the iterations. TRACE=TRUE is most useful to follow the progress of a long computation, particularly in fitme or corrHLfit calls, for which it displays some mysterious output for each set of correlation and dispersion parameter values considered by the optimiser. Non-boolean values of TRACE are meaningful, but the source code of spaMM:::.doTRACE should be consulted for their meaning. phifit (which defaults to TRUE) controls messages about the progress of residual dispersion fits in DHGLMs.

HLmethod

Allowed values are "REML", "ML", "EQL-" and "EQL+" for all models; "PQL" (="REPQL") and "PQL/L" for GLMMs only; and further values for those curious to experiment (see Details). The default is REML (standard REML for LMMs, an extended definition for other models). REML can be viewed as a fom of conditional inference, and non-standard conditionings can be called as “REML” with a non-standard REMLformula. See Details for further information.

control.HLfit

A list of parameters controlling the fitting algorithms, which should be ignored in routine use. In addition, a resid.family parameter was previously documented here (before version 2.6.40), and will still operate as previously documented, but should not be used in new code.

Possible parameters are:

conv.threshold and spaMM_tol: spaMM_tol is a list of tolerance values, with elements Xtol_rel and Xtol_abs that define thresholds for relative and absolute changes in parameter values in iterative algorithms (used in tests of the form “d(param)< Xtol_rel * param + Xtol_abs”, so that Xtol_abs is operative only for small parameter values). conv.threshold is the older way to control Xtol_rel. Default values are given by spaMM.getOption("spaMM_tol");

break_conv_logL, a boolean specifying whether the iterative algorithm should terminate when log-likelihood appears to have converged (roughly, when its relative variation over on iteration is lower than 1e-8). Default is FALSE (convergence is then assessed on the parameter estimates rather than on log-likelihood). iter.mean.dispFix, the number of iterations of the iterative algorithm for coefficients of the linear predictor, if no dispersion parameters are estimated by the iterative algorithm. Defaults to 200 except for Gamma(log)-family models; iter.mean.dispVar, the number of iterations of the iterative algorithm for coefficients of the linear predictor, if some dispersion parameter(s) is estimated by the iterative algorithm. Defaults to 50 except for Gamma(log)-family models; max.iter, the number of iterations of the iterative algorithm for joint estimation of dispersion parameters and of coefficients of the linear predictor. Defaults to 200. This is typically much more than necessary, unless there is little information to separately estimate \(\lambda\) and \(\phi\) parameters.

control.glm

List of parameters controlling GLM fits, passed to glm.control; e.g. control.glm=list(maxit=100). See glm.control for further details.

init.HLfit

A list of initial values for the iterative algorithm, with possible elements of the list are fixef for fixed effect estimates (beta), v_h for random effects vector v in the linear predictor, lambda for the parameter determining the variance of random effects \(u\) as drawn from the rand.family distribution phi for the residual variance. However, this argument can be ignored in routine use.

ranFix

A list of fixed values of random effect parameters. See ranFix for further information.

etaFix

A list of given values of the coefficients of the linear predictor. See etaFix for further information.

prior.weights

An optional vector of prior weights as in glm. This fits the data to a probability model with residual variance phi/prior.weights, and all further outputs are defined to be consistent with this (see section IV in Details).

processed

A list of preprocessed arguments, for programming purposes only (as in corrHLfit).

Value

An object of class HLfit, which is a list with many elements, not all of which are documented.

A few extractor functions are available (see extractors), and should be used as far as possible as they should be backward-compatible from version 1.4 onwards, while the structure of the return object may still evolve. The following information will be useful for extracting further elements of the object.

Elements include descriptors of the fit:

eta

Fitted values on the linear scale (including the predicted random effects);

Fitted values (\(\mu=\)<inverse-link>(\(\eta\))) of the response variable (returned by the fitted function);

fixef

The fixed effects coefficients, \(\beta\) (returned by the fixef function);

ranef

The random effects \(u\) (returned by ranef(*,type="uncorrelated");

v_h

The random effects on the linear scale, \(v\);

phi

The residual variance \(\phi\);

phi.object

A possibly more complex object describing \(\phi\);

lambda

The random-effect (\(u\)) variance(s) \(\lambda\) in compact form;

lambda.object

A possibly more complex object describing \(\lambda\);

corrPars

Agglomerates information on correlation parameters, either fixed, or estimated by HLfit, corrHLfit or fitme;

APHLs

A list which elements are various likelihood components, include conditional likelihood, h-likelihood, and the two adjusted profile h-likelihoods: the (approximate) marginal likelihood p_v and the (approximate) restricted likelihood p_bv (the latter two available through the logLik function). See the extractor function get_any_IC for information criteria (“AIC”) and effective degrees of freedom;

The covariance matrix of \(\beta\) estimates is not included as such, but can be extracted by vcov;

Information about the input is contained in output elements named as HLfit or corrHLfit arguments (data,family,resid.family,ranFix,prior.weights), with the following notable exceptions or modifications:

predictor

The formula, possibly reformatted;

resid.predictor

Analogous to predictor, for the residual variance;

rand.families

corresponding to the rand.family input;

Further miscellaneous diagnostics and descriptors of model structure:

X.pv

The design matrix for fixed effects;

ZAlist,strucList

Two lists of matrices, respectively the design matrices “Z”, and the “L” matrices, for the different random-effect terms. The extractor get_ZALMatrix can be used to reconstruct a single “ZL” matrix for all terms.

BinomialDen

(binomial data only) the binomial denominators;

the response vector; for binomial data, the frequency response.

models

Additional information on model structure for \(\eta\), \(\lambda\) and \(\phi\);

A set of indices that characterize the approximations used for likelihood;

leve_phi,lev_lambda

Leverages;

dfs

degrees of freedom for different components of the model;

warnings

A list of warnings for events that may have occurred during the fit.

Finally, the object includes programming tools: call, spaMM.version, fit_time and envir.

Details

I. Fitting methods: Many approximations for likelihood have been defined to fit mixed models (e.g. Noh and Lee (2007) for some overview), and this function implements several of them, and some additional ones. In particular, PQL as originally defined by Breslow and Clayton (1993) uses REML to estimate dispersion parameters, but this function allows one to use an ML variant of PQL. Moreover, it allows some non-standard specification of the model formula that determines the conditional distribution used in REML.

EQL stands for the EQL method of Lee and Nelder (2001). The '+' version includes the d v/ d tau correction described p. 997 of that paper, and the '-' version ignores it. PQL can be seen as the version of EQL- for GLMMs. It estimates fixed effects by maximizing h-likelihood and dispersion parameters by an approximation of REML, i.e. by maximization of an approximation of restricted likelihood. PQL/L is PQL without the leverage corrections that define REML estimation of random-effect parameters. Thus, it estimates dispersion parameters by an approximation of marginal likelihood.

HLmethod also accepts values of the form "HL(<...>)", "ML(<...>)" and "RE(<...>)", e.g. HLmethod="RE(1,1)", which allow a more direct specification of the approximations used. HL and RE are equivalent (both imply an REML correction). The first '1' means that a first order Laplace approximation to the likelihood is used to estimate fixed effects (a '0' would instead mean that the h likelihood is used as the objective function). The second '1' means that a first order Laplace approximation to the likelihood or restricted likelihood is used to estimate dispersion parameters, this approximation including the dv/d tau term specifically discussed by Lee & Nelder 2001, p. 997 (a '0' would instead mean that these terms are ignored).

It is possible to enforce the EQL approximation for estimation of dispersion parameter (i.e., Lee and Nelder's (2001) method) by adding a third index with value 0. "EQL+" is thus "HL(0,1,0)", while "EQL-" is "HL(0,0,0)". "PQL" is EQL- for GLMMs. "REML" is "HL(1,1)". "ML" is "ML(1,1)".

Some of these distinctions make sense for GLMs, and glm methods use approximations, which make a difference for Gamma GLMs. This means in particular that, (as stated in stats::logLik) the logLik of a Gamma GLM fit by glm differs from the exact likelihood. Further, the dispersion estimate returned by summary.glm differs from the one implied by logLik, because summary.glm uses Pearson residuals instead of deviance residuals, and no HLmethod tries to reproduce simultaneously these distinct behaviours. An "ML(0,0,0)" approximation of true ML provides the same log likelihood as stats::logLik, and the dispersion estimate returned by an "HL(.,.,0)" fit matches what can be computed from residual deviance and residual degrees of freedom of a glm fit, but this is not the estimate displayed by summary.glm. With a log link, the fixed effect estimates are unaffected by these distinctions.

II. Random effects are constructed in several steps. first, a vector u of independent and identically distributed (iid) random effects is drawn from some distribution; second, a transformation v=f(u) is applied to each element (this defines v which elements are still iid); third, correlated random effects are obtained as Lv where L is the “square root” of a correlation matrix (this may be meaningful only for Gaussian random effects). Coefficients in a random-coefficient model correspond to Lv. Finally, a matrix Z (or sometimes ZA, see Predictor) allows to specify how the correlated random effects affect the response values. In particular, Z is the identity matrix if there is a single observation (response) for each location, but otherwise its elements \(z_{ji}\) are 1 for the \(j\)th observation in the \(i\)th location. The design matrix for v is then of the form ZL.

The specification of the random effects u and v handles the following cases: Gaussian with zero mean, unit variance, and identity link; Beta-distributed, where \(u ~ B(1/(2\lambda),1/(2\lambda))\) with mean=1/2, and var\(=\lambda/[4(1+\lambda)]\); and with logit link v=logit(u); Gamma-distributed random effects, where \(u ~ \)Gamma(shape=1+1/\(\lambda\),scale=1/\(\lambda\)): see Gamma for allowed links and further details; and Inverse-Gamma-distributed random effects, where \(u ~ \)inverse-Gamma(shape=1+1/\(\lambda\),rate=1/\(\lambda\)): see inverse.Gamma for allowed links and further details.

III. The standard errors reported may sometimes be misleading. For each set of parameters among \(\beta\), \(\lambda\), and \(\phi\) parameters these are computed assuming that the other parameters are known without error. This is why they are labelled Cond. SE (conditional standard error). This is most uninformative in the unusual case where \(\lambda\) and \(\phi\) are not separately estimable parameters. Further, the SEs for \(\lambda\) and \(\phi\) are rough approximations as discussed in particular by Smyth et al. (2001; \(V_1\) method).

IV. prior weigths. This controls the likelihood analysis of heteroscedastic models. In particular, increasing the weights by a constant factor f should, and will, yield (Intercept) estimates of phi also increased by f (except if a non-trivial resid.formula with log link is used). This is consistent with what glm does, but some widely used packages do not follow this logic.

References

Breslow, NE, Clayton, DG. (1993). Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association 88, 9-25.

Lee, Y., Nelder, J. A. (2001) Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika 88, 987-1006.

Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Generalized linear models with random effects: unified analysis via h-likelihood. Chapman & Hall: London.

Noh, M., and Lee, Y. (2007). REML estimation for binary data in GLMMs, J. Multivariate Anal. 98, 896-915.

Smyth GK, Huele AF, Verbyla AP (2001). Exact and approximate REML for heteroscedastic regression. Statistical Modelling 1, 161-175.

Examples

Run this code

# NOT RUN {
data("wafers")
## Gamma GLMM with log link
# }
# NOT RUN {
<!-- % example also in main page... -->
# }
# NOT RUN {
HLfit(y ~X1+X2+X1*X3+X2*X3+I(X2^2)+(1|batch),family=Gamma(log),
          resid.model = ~ X3+I(X3^2) ,data=wafers)
# }
# NOT RUN {
<!-- %- : tested in update.Rd -->
# }
# NOT RUN {
## Gamma - inverseGamma HGLM with log link
HLfit(y ~X1+X2+X1*X3+X2*X3+I(X2^2)+(1|batch),family=Gamma(log),
          HLmethod="HL(1,1)",rand.family=inverse.Gamma(log),
          resid.model = ~ X3+I(X3^2) ,data=wafers)
# }

Run the code above in your browser using DataLab