Generate a model selection table of models with combinations (subsets) of fixed effect terms in the global model, with optional model inclusion rules.
dredge(global.model, beta = c("none", "sd", "partial.sd"), evaluate = TRUE,
rank = "AICc", fixed = NULL, m.lim = NULL, m.min, m.max, subset,
trace = FALSE, varying, extra, ct.args = NULL, deps = attr(allTerms0, "deps"),
cluster = NULL,
...)# S3 method for model.selection
print(x, abbrev.names = TRUE, warnings = getOption("warn") != -1L, ...)
An object of class c("model.selection", "data.frame")
, being a
data.frame
, where each row represents one model.
See model.selection.object
for its structure.
a fitted ‘global’ model object. See ‘Details’ for a list of supported types.
indicates whether and how the coefficients are standardized, and
must be one of "none"
, "sd"
or "partial.sd"
. You can
specify just the initial letter. "none"
corresponds to unstandardized
coefficients, "sd"
and "partial.sd"
to coefficients
standardized by SD and Partial SD, respectively. For
backwards compatibility, logical value is also accepted, TRUE
is
equivalent to "sd"
and FALSE
to "none"
.
See std.coef
.
whether to evaluate and rank the models. If FALSE
, a
list of unevaluated call
s is returned.
optionally, the rank function returning a sort of an information
criterion, to be used instead AICc
, e.g. AIC
, QAIC
or
BIC
.
See ‘Details’.
optional, either a single-sided formula or a character vector giving names of terms to be included in all models. Not to be confused with fixed effects. See ‘Subsetting’.
optionally, the limits c(lower, upper)
for the number of terms in a single model (excluding the intercept). An
NA
means no limit. See ‘Subsetting’.
Specifying limits as m.min
and m.max
is allowed for backward
compatibility.
logical expression or a matrix
describing models to
be kept in the resulting set. NULL
or TRUE
disables subsetting. For details, see ‘Subsetting’.
if TRUE
or 1
, all calls to the fitting function
are printed before actual fitting takes place. If trace > 1
, a progress bar
is displayed.
optionally, a named list describing the additional arguments
to vary between the generated models. Item names correspond to the
arguments, and each item provides a list of choices (i.e. list(arg1 =
list(choice1, choice2, ...), ...)
). Complex elements in the choice list
(such as family
objects) should be either named (uniquely) or quoted
(unevaluated, e.g. using alist
, see quote
),
otherwise the result may be visually unpleasant. See example in
Beetle
.
optional additional statistics to be included in the result,
provided as functions, function names or a list of such (preferably named
or quoted). As with the rank
argument, each function must accept as
an argument a fitted model object and return (a value coercible to) a
numeric vector.
This could be, for instance, additional information criteria or goodness-of-fit
statistics. The character strings "R^2"
and "adjR^2"
are
treated in a special way and add a likelihood-ratio based \(R^{2}\) and
modified-\(R^{2}\) to the result, respectively (this is more efficient than using
r.squaredLR
directly).
a model.selection
object, returned by dredge
.
Should term names in the table header be abbreviated when
printed? This is the default. If full names are required, use print()
explicitly with this argument set to FALSE
.
if TRUE
, errors and warnings issued during the model
fitting are printed below the table (only with pdredge
).
To permanently remove the warnings, set the object's attribute
"warnings"
to NULL
.
optional list of arguments to be passed to
coefTable
(e.g. dispersion
parameter for glm
affecting standard errors used in subsequent
model averaging
).
a “dependency matrix” as returned by getAllTerms
,
attribute "deps"
. Can be used to fine-tune marginality
exceptions.
if a valid "cluster"
object is given, it is used for
parallel execution. If NULL
or omitted, execution is
single-threaded.
With parallel calculation, an extra argument check
is accepted.
See pdredge
for details and examples.
optional arguments for the rank
function. Any can be
an unevaluated expression, in which case any x
within it will be
substituted with the current model.
Kamil Bartoń
Models are fitted through repeated evaluation of the modified call extracted from
the global.model
(in a similar fashion to update
). This
approach, while having the advantage that it can be applied to most model types through the
usual formula interface, can have a considerable computational overhead.
Note that the number of combinations grows exponentially with the number of predictors (\(2^{N}\), less when interactions are present, see below).
The fitted model objects are not stored in the result. To get (a subset of)
the models, use get.models
on the object returned by dredge
.
Another way to get all the models is to run
lapply(dredge(..., evaluate = FALSE), eval)
,
which avoids fitting models twice.
For a list of model types that can be used as a global.model
see
the list of supported models. Modelling functions that
do not store a call
in their result should be run via a
wrapper function created by updateable
.
rank
is found by a call to match.fun
and may be specified as a
function, a symbol, or as a character string specifying a function to be searched
for from the environment of the call to dredge
. It can be also a
one-element named list, where the first element is taken as the rank function.
The function rank
must accept a model object as its first argument and
always return a scalar.
By default, marginality constraints are respected, so that “all possible
combinations” include only those that contain interactions with their
respective main effects and all lower order terms, unless the
global.model
makes an exception to this principle (e.g. due
to a nested design such as a / b
).
The resulting set of models can be constrained with three methods:
(1) set limits on
the number of terms in a model with m.lim
, (2) bind term(s) to all
models with fixed
, and (3) use subset
for more complex rules.
To be included in the selection table, the formulation of a model must satisfy
all these conditions.
subset
can be an expression or a matrix.
If a matrix, it should be a logical, lower triangular matrix, with
rows and columns corresponding to global.model
terms. If this matrix has
dimnames
, they must match the term names (as returned by getAllTerms
).
Unmatched names are silently ignored. Otherwise, if rows or columns are unnamed,
they are matched positionally to the model terms, and dim(subset)
must
be equal to the number of terms.
For example, subset["a", "b"] == FALSE
excludes models
with both a and b terms; and if unnamed, subset
,
subset[2, 3] == FALSE
will prevent the second and third terms of the
global model from being both in the same model.
demo(dredge.subset)
has examples of using the subset
matrix in
conjunction with correlation matrices to exclude models containing collinear
predictors.
In the form of an expression
, the argument subset
acts similarly
to that of subset()
for data.frame
s. Model terms can be referred
to by name as variables in the expression, except that they are interpreted as
logical values indicating the presence of a term in the model.
The expression can contain any of the global.model
term names, as well as
names of the varying
list items. global.model
term names take
precedence when identical to names of varying
, so to avoid ambiguity
varying
variables in subset
expression should be enclosed in
V()
(e.g. V(family) == "Gamma"
) assuming that
varying
is something like list(family =
c("Gamma", ...))
).
If elements of varying
are unnamed, they are coerced into names. Calls
and symbols are represented as character values (via "deparse"), and everything
except numeric, logical, character and NULL values is represented by element
numbers (e.g. subset = V(family) == 2
points to Gamma
family in
varying =
list(family =
list(gaussian, Gamma)
). This can
easily become obscure, so using named lists in varying
is recommended.
Examples can be found in demo(dredge.varying)
.
Term names appearing in fixed
and subset
must be given exactly
as they are returned by getAllTerms(global.model)
, which may differ
from the original term names (e.g. the interaction term components are ordered
alphabetically).
The with(x)
and with(+x)
notation indicates, respectively, any and
all interactions including the main effect term x
. This is only effective
with marginality exceptions. The extended form with(x, order)
allows to
specify the order of interaction of terms of which x
is a part. For
instance, with(b, 2:3)
selects models with at least one second- or
third-order interaction of variable b
. The second (positional)
argument is coerced to an integer vector. The “dot” notation .(x)
is
an alias for with
.
The special variable `*nvar*`
(backtick-quoted), in the subset
expression is equal to the number of
terms in the model (not the number of parameters).
To include a model term conditionally on the presence of another term,
use dc
(“dependency chain”) in
the subset
expression. dc
takes any number of term names as
arguments, and allows a term to be included only if all preceding ones
are also present (e.g. subset = dc(a, b, c)
allows for models a
,
a+b
and a+b+c
but not b
, c
, b+c
or
a+c
).
subset
expression can have a form of an unevaluated call
,
expression
object, or a one-sided formula
. See ‘Examples’.
Compound model terms (such as interactions, ‘as-is’ expressions within
I()
or smooths in gam
) should be enclosed within curly brackets
(e.g. {s(x,k=2)}
), or backticks (like non-syntactic
names, e.g.
`s(x, k = 2)`
), except when they are arguments to with
or dc
.
Backtick-quoted names must match exactly (including whitespace) the term names
as returned by getAllTerms
.
subset
expression syntax summary
a & b
indicates that model terms a and b must be present (see Logical Operators)
{log(x,2)}
or `
log(x, 2)
`
represent a complex
model term log(x, 2)
V(x)
represents a varying
item x
with(x)
indicates that at least one term containing the main effect term x must be present
with(+x)
indicates that all the terms containing the main effect term x must be present
with(x, n:m)
indicates that at least one term containing an n-th to m-th order interaction term of x must be present
dc(a, b, c,...)
‘dependency chain’: b is allowed only if a is present, and c only if both a and b are present, etc.
`*nvar*`
the number of terms in the model.
To simply keep certain terms in all models, it is much more efficient to use
the fixed
argument. The fixed
formula is interpreted in the same
manner as model formula, so the terms must not be quoted.
Use of na.action = "na.omit"
(R's default) or "na.exclude"
in
global.model
must be avoided, as it results with sub-models fitted to
different data sets if there are missing values. An error is thrown if it is
detected.
It is a common mistake to give na.action
as an argument in the call
to dredge
(typically resulting in an error from the rank
function to which the argument is passed through ‘...’), while the
correct way
is either to pass na.action
in the call to the global model or to set
it as a global option.
If present in the global.model
, the intercept will be included in all
sub-models.
There are subset
and
plot
methods, the latter creates a
graphical representation of model weights and per-model term sum of weights.
Coefficients can be extracted with coef
or coefTable
.
get.models
, model.avg
. model.sel
for
manual model selection tables.
Possible alternatives: glmulti
in package glmulti
and bestglm
(bestglm).
regsubsets
in package leaps also performs all-subsets
regression.
Variable selection through regularization provided by various packages, e.g. glmnet, lars or glmmLasso.
# Example from Burnham and Anderson (2002), page 100:
# prevent fitting sub-models to different datasets
oop <-
options(na.action = "na.fail")
fm1 <- lm(y ~ ., data = Cement)
dd <- dredge(fm1)
subset(dd, delta < 4)
# Visualize the model selection table:
if(require(graphics)) {
par(mar = c(3,5,6,4))
plot(dd, labAsExpr = TRUE)
}
# Model average models with delta AICc < 4
model.avg(dd, subset = delta < 4)
#or as a 95% confidence set:
model.avg(dd, subset = cumsum(weight) <= .95) # get averaged coefficients
#'Best' model
summary(get.models(dd, 1)[[1]])
if (FALSE) {
# Examples of using 'subset':
# keep only models containing X3
dredge(fm1, subset = ~ X3) # subset as a formula
dredge(fm1, subset = expression(X3)) # subset as expression object
# the same, but more effective:
dredge(fm1, fixed = "X3")
# exclude models containing both X1 and X2 at the same time
dredge(fm1, subset = !(X1 && X2))
# Fit only models containing either X3 or X4 (but not both);
# include X3 only if X2 is present, and X2 only if X1 is present.
dredge(fm1, subset = dc(X1, X2, X3) && xor(X3, X4))
# the same as above, without "dc"
dredge(fm1, subset = (X1 | !X2) && (X2 | !X3) && xor(X3, X4))
# Include only models with up to 2 terms (and intercept)
dredge(fm1, m.lim = c(0, 2))
}
# Add R^2 and F-statistics, use the 'extra' argument
dredge(fm1, m.lim = c(NA, 1), extra = c("R^2", F = function(x)
summary(x)$fstatistic[[1]]))
# with summary statistics:
dredge(fm1, m.lim = c(NA, 1), extra = list(
"R^2", "*" = function(x) {
s <- summary(x)
c(Rsq = s$r.squared, adjRsq = s$adj.r.squared,
F = s$fstatistic[[1]])
})
)
# Add other information criteria (but rank with AICc):
dredge(fm1, m.lim = c(NA, 1), extra = alist(AIC, BIC, ICOMP, Cp))
options(oop)
Run the code above in your browser using DataLab