Using a fitted model object, determine a reference grid for which estimated
marginal means are defined. The resulting ref_grid
object encapsulates
all the information needed to calculate EMMs and make inferences on them.
ref_grid(object, at, cov.reduce = mean, mult.names, mult.levs,
options = get_emm_option("ref_grid"), data, df, type,
transform = c("none", "response", "mu", "unlink", "log"), nesting,
covnest = FALSE, offset, ...)
An object produced by a supported model-fitting function, such
as lm
. Many models are supported.
See vignette("models", "emmeans")
.
Optional named list of levels for the corresponding variables
A function, logical value, or formula; or a named list of
these. Each covariate not specified in at
is reduced
according to these specifications. See the section below on
“Using cov.reduce
”.
Character value: the name(s) to give to the pseudo-factor(s)
whose levels delineate the elements of a multivariate response. If this is
provided, it overrides the default name(s) used for
class(object)
when it has a multivariate response
(e.g., the default is "rep.meas"
for "mlm"
objects).
A named list of levels for the dimensions of a multivariate
response. If there is more than one element, the combinations of levels are
used, in expand.grid
order. The (total) number of levels must
match the number of dimensions. If mult.name
is specified, this
argument is ignored.
If non-NULL
, a named list
of arguments to pass
to update.emmGrid
, just after the object is constructed.
A data.frame
to use to obtain information about the
predictors (e.g. factor levels). If missing, then
recover_data
is used to attempt to reconstruct the data.
See the note with recover_data
for an important precaution.
Numeric value. This is equivalent to specifying
options(df = df)
. See update.emmGrid
.
Character value. If provided, this is saved as the
"predict.type"
setting. See update.emmGrid
and the section
below on prediction types and transformations.
Character value. If other than "none"
, the reference
grid is reconstructed via regrid
with the given
transform
argument. See the section below on prediction types and
transformations.
If the model has nested fixed effects, this may be specified
here via a character vector or named list
specifying the nesting
structure. Specifying nesting
overrides any nesting structure that
is automatically detected. See Details.
Logical value. If TRUE
, covariates having more than
one value in the reference grid are included when auto-detecting nesting.
Set this to TRUE
only if you have covariate values that logically
depend on some other factor's levels.
Numeric scalar value (if a vector, only the first element is
used). This may be used to add an offset, or override offsets based on the model.
A common usage would be to specify offset = 0
for a Poisson
regression model, so that predictions from the reference grid become
rates relative to the offset that had been specified in the model.
Optional arguments passed to emm_basis
, such as
vcov.
(see Details below) or options for certain models (see
vignette("models", "emmeans")).
An object of the S4 class "emmGrid"
(see
emmGrid-class
). These objects encapsulate everything needed
to do calculations and inferences for estimated marginal means, and contain
nothing that depends on the model-fitting procedure.
cov.reduce
may be a function, logical value, formula, or a named list of
these.
If a single function, it is applied to each covariate.
If logical and TRUE
, mean
is used. If logical and FALSE
,
it is equivalent to specifying function(x) sort(unique(x)), and these
values are considered part of the reference grid; thus, it is a handy
alternative to specifying these same values in at
.
If a formula (which must be two-sided), then a model is fitted to that
formula using lm
; then in the reference grid, its response
variable is set to the results of predict
for that model, with
the reference grid as newdata
. (This is done after the
reference grid is determined.) A formula is appropriate here when you think
experimental conditions affect the covariate as well as the response.
If cov.reduce
is a named list, then the above criteria are used to
determine what to do with covariates named in the list. (However, formula
elements do not need to be named, as those names are determined from the
formulas' left-hand sides.) Any unresolved covariates are reduced using
"mean"
.
Any cov.reduce
specification for a covariate also named in at
is ignored.
Care must be taken when covariate values depend on one another. For example,
when a polynomial model was fitted using predictors x
, x2
(equal to x^2
), and x3
(equal to x^3
), the reference
grid will by default set x2
and x3
to their means, which is
inconsistent. The user should instead use the at
argument to set these
to the square and cube of mean(x)
. Better yet, fit the model using
a formula involving poly(x, 3)
or I(x^2)
and I(x^3)
; then
there is only x
appearing as a covariate; it will be set to its mean,
and the model matrix will have the correct corresponding quadratic and cubic
terms.
Support for covariates that appear in the dataset as matrices is very limited.
If the matrix has but one column, it is treated like an ordinary covariate.
Otherwise, with more than one column, each column is reduced to a single
reference value -- the result of applying cov.reduce
to each column
(averaged together
if that produces more than one value); you may not specify values
in at
; and they are not treated as variables in the reference grid,
except for purposes of obtaining predictions.
Ability to support a particular class of object
depends on the
existence of recover_data
and emm_basis
methods -- see
extending-emmeans for details. The call methods("recover_data")
will help identify these.
Data. In certain models, (e.g., results of glmer.nb
), it is not
possible to identify the original dataset. In such cases, we can work around
this by setting data
equal to the dataset used in fitting the model,
or a suitable subset. Only the complete cases in data
are used, so it
may be necessary to exclude some unused variables. Using data
can also
help save computing, especially when the dataset is large. In any case,
data
must represent all factor levels used in fitting the model. It
cannot be used as an alternative to at
. (Note: If there is a
pattern of NAs
that caused one or more factor levels to be excluded
when fitting the model, then data
should also exclude those levels.)
Covariance matrix. By default, the variance-covariance matrix for the
fixed effects is obtained from object
, usually via its
vcov
method. However, the user may override this via a
vcov.
argument, specifying a matrix or a function. If a matrix, it
must be square and of the same dimension and parameter order of the fixed
effects. If a function, must return a suitable matrix when it is called with
object
as its only argument.
Nested factors. Having a nesting structure affects marginal averaging
in emmeans
in that it is done separately for each level (or
combination thereof) of the grouping factors. ref_grid
tries to
discern which factors are nested in other factors, but it is not always
obvious, and if it misses some, the user must specify this structure via
nesting
; or later using update.emmGrid
. The nesting
argument may be a character vector or a named list
. If a list
,
each name should be the name of a single factor in the grid, and its entry a
character vector of the name(s) of its grouping factor(s). nested
may
also be a character value of the form "factor1 %in%
(factor2*factor3)"
(the parentheses are optional). If there is more than one
such specification, they may
be appended separated by commas, or as separate elements of a character
vector. For example, these specifications are equivalent: nesting =
list(state = "country", city = c("state", "country")
, nesting = "state
%in% country, city %in% (state*country)"
, and nesting = c("state
%in% country", "city %in% state*country")
.
In certain unusual cases, a covariate (rather than a factor) may be nested.
Support for such situations is limited to the extent that only covariate
values that exactly match a value in the dataset is permitted. I recommend
supplying a reference dataset in the data
argument that contains the
desired covariate values for the reference grid; then the nesting will be
handled correctly if you specify covnest = TRUE
and
cov.reduce = FALSE
.
When the fitted model contains subscripts or explicit references to data
sets, the reference grid may optionally be post-processed to simplify the
variable names, depending on the simplify.names
option (see
emm_options
), which by default is TRUE
. For example, if
the model formula is data1$resp ~ data1$trt + data2[[3]] +
data2[["cov"]]
, the simplified predictor names (for use, e.g., in the
specs
for emmeans
) will be trt
,
data2[[3]]
, and cov
. Numerical subscripts are not simplified; nor are
variables having simplified names that coincide, such as if data2$trt
were
also in the model.
Please note that this simplification is performed after the
reference grid is constructed. Thus, non-simplified names must be used in
the at
argument (e.g., at = list(`data2["cov"]` = 2:4)
.
If you don't want names simplified, use
emm_options(simplify.names = FALSE)
.
There is a subtle difference between specifying type = "response" and
transform = "response". While the summary statistics for the grid
itself are the same, subsequent use in emmeans
will yield
different results if there is a response transformation or link function.
With type = "response", EMMs are computed by averaging together
predictions on the linear-predictor scale and then back-transforming
to the response scale; while with transform = "response", the
predictions are already on the response scale so that the EMMs will be
the arithmetic means of those response-scale predictions. To add further to
the possibilities, geometric means of the response-scale predictions
are obtainable via transform = "log", type = "response".
The most recent result of ref_grid
, whether called directly or
indirectly via emmeans
, emtrends
, or some other
function that calls one of these, is saved in the user's environment as
.Last.ref_grid
. This facilitates checking what reference grid was
used, or reusing the same reference grid for further calculations. This
automatic saving is enabled by default, but may be disabled via
emm_options(save.ref_grid = FALSE), and re-enabled by specifying
TRUE
.
To users, the ref_grid
function itself is important because
most of its arguments are in effect arguments of emmeans
and related functions, in that those functions pass their ...
arguments
to ref_grid
.
The reference grid consists of combinations of independent variables over
which predictions are made. Estimated marginal means are defined as these
predictions, or marginal averages thereof. The grid is determined by first
reconstructing the data used in fitting the model (see
recover_data
), or by using the data.frame
provided in
data
. The default reference grid is determined by the observed
levels of any factors, the ordered unique values of character-valued
predictors, and the results of cov.reduce
for numeric predictors.
These may be overridden using at
. See also the section below on
recovering/overriding model information.
Reference grids are of class emmGrid
,
and several
methods exist for them -- for example summary.emmGrid
. Reference
grids are fundamental to emmeans
. Supported models are
detailed in vignette("models", "emmeans")
.
# NOT RUN {
fiber.lm <- lm(strength ~ machine*diameter, data = fiber)
ref_grid(fiber.lm)
summary(.Last.ref_grid)
ref_grid(fiber.lm, at = list(diameter = c(15, 25)))
# }
# NOT RUN {
# We could substitute the sandwich estimator vcovHAC(fiber.lm)
# as follows:
summary(ref_grid(fiber.lm, vcov. = sandwich::vcovHAC))
# }
# NOT RUN {
# If we thought that the machines affect the diameters
# (admittedly not plausible in this example), then we should use:
ref_grid(fiber.lm, cov.reduce = diameter ~ machine)
# Multivariate example
MOats.lm = lm(yield ~ Block + Variety, data = MOats)
ref_grid(MOats.lm, mult.names = "nitro")
# Silly illustration of how to use 'mult.levs' to make comb's of two factors
ref_grid(MOats.lm, mult.levs = list(T=LETTERS[1:2], U=letters[1:2]))
# }
Run the code above in your browser using DataLab