getY: Get Model Response Variable

Description

Extract the response variable from a fitted model in the original or link scale (for GLM's).

Usage

getY(mod, family = NULL, data = NULL, link = FALSE, ...)

Arguments

mod

A fitted model object of class "lm", "glm", or "merMod". Alternatively, a numeric vector, corresponding to a variable to be transformed. Can also be a list or nested list of such objects.

family

Optional, the error distribution family containing the link function which will be used to transform the response (see family for specification details).

data

An optional dataset used to first re-fit the model(s).

link

Logical. If TRUE, return the response variable on the link scale (see Details).

...

Not currently used.

Value

A numeric vector comprising the response variable in the original or link scale, or an array, list of vectors/arrays, or nested list.

Details

getY will return the response variable from a model by summing the fitted values and the response residuals. If link = TRUE and the model is a GLM, the response is transformed using the model link function. However, if this transformation results in undefined values, it is replaced by an estimate based on the 'working' response variable of the GLM (see below). The function can also be used to transform a variable (supplied to mod) using the link function from the specified family - in which case the link argument is ignored.

Estimating the link-transformed response

A key challenge in generating fully standardised model coefficients for a generalised linear model (GLM) with a non-gaussian link function is the difficulty in calculating appropriate standardised ranges (typically the standard deviation) for the response variable in the link scale. This is because directly transforming the response will often produce undefined values. Although methods for circumventing this issue by indirectly estimating the variance of the link-transformed response have been proposed - including a latent-theoretic approach for binomial models (McKelvey & Zavoina 1975) and a more general variance-based method using a pseudo R-squared (Menard 2011) - here an alternative approach is used. Where transformed values are undefined, the function will instead return the synthetic 'working' response from the iteratively reweighted least squares (IRLS) algorithm of the GLM (McCullagh & Nelder 1989). This is reconstructed as the sum of the linear predictor and the working residuals - with the latter comprising the error of the model in the link scale. The advantage of this approach is that a relatively straightforward 'transformation' of any non-gaussian response is readily attainable in all cases. The standard deviation (or other relevant range) can then be calculated using values of the transformed response and used to scale the coefficients. An additional benefit for piecewise SEM's is that the transformed rather than original response can then be specified as a predictor in other models, ensuring that standardised indirect and total effects are calculated correctly (i.e. using the same units for the variable).

To ensure a high level of 'accuracy' in the working response - in the sense that the inverse-transformed values are practically indistinguishable from the original response - the function uses the following iterative fitting procedure to calculate a 'final' working response:

The working response is calculated from this model
The inverse transformation of the working response is then calculated
If the inverse transformation is effectively equal to the original response (testing using all.equal with the default tolerance), the working response is returned; otherwise, the GLM is re-fit with the working response now as the predictor, and steps 2-4 are repeated - each time with an additional IWLS iteration

This approach will generate a very reasonable transformation of the response variable, which will also closely resemble the direct transformation where this can be compared - see Examples. It also ensures that the transformed values, and hence the standard deviation, are the same for any GLM fitting the same response - provided it uses the same link function - and so facilitates model comparisons, selection, and averaging.

References

Grace, J.B., Johnson, D.J., Lefcheck, J.S. and Byrnes, J.E.K. (2018) Quantifying relative importance: computing standardized effects in models with binary outcomes. Ecosphere 9, e02283. https://doi.org/gdm5bj

McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models (2nd Edition). London: Chapman and Hall.

McKelvey, R. D., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. The Journal of Mathematical Sociology, 4(1), 103-120. https://doi.org/dqfhpp

Menard, S. (2011) Standards for Standardized Logistic Regression Coefficients. Social Forces 89, 1409-1428. https://doi.org/bvxb6s

Examples

Run this code

# NOT RUN {
## SEM responses (original scale)
getY(Shipley.SEM)

## Estimated response in link scale from binomial model
m <- Shipley.SEM$Live
getY(m, link = TRUE)
getY(m, link = TRUE, family = binomial("probit"))  # different link function

## Same estimate calculated using variable instead of model
y <- Shipley$Live
getY(y, binomial)

## Compare estimate with a direct link transformation
## (test with a poisson model, log link)
set.seed(1)
y <- rpois(30, lambda = 10)
y2 <- y
m <- suppressWarnings(
  glm(y ~ y2, poisson, control = list(maxit = 1))
)
i <- 0
repeat {
  yl <- predict(m) + resid(m, "working")
  yli <- family(m)$linkinv(yl)
  eql <- isTRUE(all.equal(yli, y, check.names = FALSE))
  if (eql) return(yl) else {
    i <- i + 1
    m <- suppressWarnings(
      update(m, . ~ yl, control = list(maxit = i))
    )
  }
}
## Effectively equal?
all.equal(yl, log(y), check.names = FALSE)
# TRUE
## Actual difference...
all.equal(yl, log(y), check.names = FALSE, tolerance = .Machine$double.eps)
# "Mean relative difference: 1.05954e-12"
# }

Run the code above in your browser using DataLab