Function to test differences of adjusted predictions for
statistical significance. This is usually called contrasts or (pairwise)
comparisons, or "marginal effects". hypothesis_test()
is an alias.
test_predictions(object, ...)hypothesis_test(object, ...)
# S3 method for default
test_predictions(
object,
terms = NULL,
by = NULL,
test = "pairwise",
equivalence = NULL,
scale = "response",
p_adjust = NULL,
df = NULL,
ci_level = 0.95,
collapse_levels = FALSE,
margin = "mean_reference",
engine = "marginaleffects",
verbose = TRUE,
...
)
# S3 method for ggeffects
test_predictions(
object,
by = NULL,
test = "pairwise",
equivalence = NULL,
scale = "response",
p_adjust = NULL,
df = NULL,
collapse_levels = FALSE,
engine = "marginaleffects",
verbose = TRUE,
...
)
A data frame containing predictions (e.g. for test = NULL
),
contrasts or pairwise comparisons of adjusted predictions or estimated
marginal means.
A fitted model object, or an object of class ggeffects
. If
object
is of class ggeffects
, arguments terms
, margin
and ci_level
are taken from the ggeffects
object and don't need to be specified.
Arguments passed down to data_grid()
when creating the reference
grid and to marginaleffects::predictions()
resp. marginaleffects::slopes()
.
For instance, arguments type
or transform
can be used to back-transform
comparisons and contrasts to different scales. vcov
can be used to
calculate heteroscedasticity-consistent standard errors for contrasts.
See examples at the bottom of
this vignette
for further details.
To define a heteroscedasticity-consistent variance-covariance matrix, you can
either use the same arguments as for predict_response()
etc., namely
vcov_fun
, vcov_type
and vcov_args
. These are then transformed into a
matrix and passed down to the vcov
argument in marginaleffects. Or you
directly use the vcov
argument. See ?marginaleffects::slopes
for further
details.
If object
is an object of class ggeffects
, the same terms
argument is used as for the predictions, i.e. terms
can be ignored. Else,
if object
is a model object, terms
must be a character vector with the
names of the focal terms from object
, for which contrasts or comparisons
should be displayed. At least one term is required, maximum length is three
terms. If the first focal term is numeric, contrasts or comparisons for the
slopes of this numeric predictor are computed (possibly grouped by the
levels of further categorical focal predictors).
Character vector specifying the names of predictors to condition on.
Hypothesis test is then carried out for focal terms by each level of by
variables. This is useful especially for interaction terms, where we want
to test the interaction within "groups". by
is only relevant for
categorical predictors.
Hypothesis to test, defined as character string. Can be one of:
"pairwise"
(default), to test pairwise comparisons.
"contrast"
to test simple contrasts (i.e. each level is tested against
the average over all levels).
"exclude"
to test simple contrasts (i.e. each level is tested against
the average over all other levels, excluding the contrast that is being
tested).
"interaction"
to test interaction contrasts (difference-in-difference
contrasts).
"consecutive"
to test contrasts between consecutive levels of a predictor.
"polynomial"
to test orthogonal polynomial contrasts, assuming
equally-spaced factor levels.
A character string with a custom hypothesis, e.g. "b2 = b1"
. This would
test if the second level of a predictor is different from the first level.
Custom hypotheses are very flexible. It is also possible to test interaction
contrasts (difference-in-difference contrasts) with custom hypotheses, e.g.
"(b2 - b1) = (b4 - b3)"
. See also section Introduction into contrasts
and pairwise comparisons.
A data frame with custom contrasts. See 'Examples'.
NULL
, in which case simple contrasts are computed.
Technical details about the packages used as back-end to calculate contrasts and pairwise comparisons are provided in the section Packages used as back-end to calculate contrasts and pairwise comparisons below.
ROPE's lower and higher bounds. Should be "default"
or
a vector of length two (e.g., c(-0.1, 0.1)
). If "default"
,
bayestestR::rope_range()
is used. Instead of using the equivalence
argument, it is also possible to call the equivalence_test()
method
directly. This requires the parameters package to be loaded. When
using equivalence_test()
, two more columns with information about the
ROPE coverage and decision on H0 are added. Furthermore, it is possible
to plot()
the results from equivalence_test()
. See
bayestestR::equivalence_test()
resp. parameters::equivalence_test.lm()
for details.
Character string, indicating the scale on which the contrasts or comparisons are represented. Can be one of:
"response"
(default), which would return contrasts on the response
scale (e.g. for logistic regression, as probabilities);
"link"
to return contrasts on scale of the linear predictors
(e.g. for logistic regression, as log-odds);
"probability"
(or "probs"
) returns contrasts on the probability scale,
which is required for some model classes, like MASS::polr()
;
"oddsratios"
to return contrasts on the odds ratio scale (only applies
to logistic regression models);
"irr"
to return contrasts on the odds ratio scale (only applies to
count models);
or a transformation function like "exp"
or "log"
, to return transformed
(exponentiated respectively logarithmic) contrasts; note that these
transformations are applied to the response scale.
Note: If the scale
argument is not supported by the provided object
,
it is automatically changed to a supported scale-type (a message is printed
when verbose = TRUE
).
Character vector, if not NULL
, indicates the method to
adjust p-values. See stats::p.adjust()
or stats::p.adjust.methods
for details. Further possible adjustment methods are "tukey"
or "sidak"
,
and for johnson_neyman()
, "fdr"
(or "bh"
) and "esarey"
(or its
short-cut "es"
) are available options. Some caution is necessary when
adjusting p-value for multiple comparisons. See also section P-value adjustment
below.
Degrees of freedom that will be used to compute the p-values and
confidence intervals. If NULL
, degrees of freedom will be extracted from
the model using insight::get_df()
with type = "wald"
.
Numeric, the level of the confidence intervals. If object
is an object of class ggeffects
, the same ci_level
argument is used as
for the predictions, i.e. ci_level
can be ignored.
Logical, if TRUE
, term labels that refer to identical
levels are no longer separated by "-", but instead collapsed into a unique
term label (e.g., "level a-level a"
becomes "level a"
). See 'Examples'.
Character string, indicates the method how to marginalize over
non-focal terms. See predict_response()
for details. If object
is an
object of class ggeffects
, the same margin
argument is used as for the
predictions, i.e. margin
can be ignored.
Character string, indicates the package to use for computing
contrasts and comparisons. Usually, this argument can be ignored, unless you
want to explicitly use another package than marginaleffects to calculate
contrasts and pairwise comparisons. engine
can be either "marginaleffects"
(default) or "emmeans"
. The latter is useful when the marginaleffects
package is not available, or when the emmeans package is preferred. Note
that using emmeans as back-end is currently not as feature rich as the default
(marginaleffects) and still in development. Setting engine = "emmeans"
provides some additional test options: "interaction"
to calculate interaction
contrasts, "consecutive"
to calculate contrasts between consecutive levels of a
predictor, or a data frame with custom contrasts (see also test
). There is
an experimental option as well, engine = "ggeffects"
. However, this is
currently work-in-progress and offers much less options as the default engine,
"marginaleffects"
. It can be faster in some cases, though, and works for
comparing predicted random effects in mixed models, or predicted probabilities
of the zero-inflation component. If the marginaleffects package is not
installed, the emmeans package is used automatically. If this package is
not installed as well, engine = "ggeffects"
is used.
Toggle messages and warnings.
There are many ways to test contrasts or pairwise comparisons. A detailed introduction with many (visual) examples is shown in this vignette.
A simple workflow includes calculating adjusted predictions and passing the
results directly to test_predictions()
, e.g.:
# 1. fit your model
model <- lm(mpg ~ hp + wt + am, data = mtcars)
# 2. calculate adjusted predictions
pr <- predict_response(model, "am")
pr
# 3. test pairwise comparisons
test_predictions(pr)
See also this vignette.
The test
argument is used to define which kind of contrast or comparison
should be calculated. The default is to use the marginaleffects package.
Here are some technical details about the packages used as back-end. When
test
is...
"pairwise"
(default), pairwise comparisons are based on the marginaleffects
package.
"contrast"
uses the emmeans package, i.e. emmeans::contrast(method = "eff")
is called.
"exclude"
relies on the emmeans package, i.e. emmeans::contrast(method = "del.eff")
is called.
"polynomial"
relies on the emmeans package, i.e. emmeans::contrast(method = "poly")
is called.
"interaction"
uses the emmeans package, i.e. emmeans::contrast(interaction = ...)
is called.
"consecutive"
also relies on the emmeans package, i.e.
emmeans::contrast(method = "consec")
is called.
a character string with a custom hypothesis, the marginaleffects package is used.
a data frame with custom contrasts, emmeans is used again.
NULL
calls functions from the marginaleffects package with
hypothesis = NULL
.
If all focal terms are only present as random effects in a mixed model, or if predicted probabilities for the zero-inflation component of a model should be tested, functions from the ggeffects package are used. There is an example for pairwise comparisons of random effects in this vignette.
Note that p-value adjustment for methods supported by p.adjust()
(see also
p.adjust.methods
), each row is considered as one set of comparisons, no
matter which test
was specified. That is, for instance, when test_predictions()
returns eight rows of predictions (when test = NULL
), and p_adjust = "bonferroni"
,
the p-values are adjusted in the same way as if we had a test of pairwise
comparisons (test = "pairwise"
) where eight rows of comparisons are
returned. For methods "tukey"
or "sidak"
, a rank adjustment is done
based on the number of combinations of levels from the focal predictors
in terms
. Thus, the latter two methods may be useful for certain tests
only, in particular pairwise comparisons.
For johnson_neyman()
, the only available adjustment methods are "fdr"
(or "bh"
) (Benjamini & Hochberg (1995)) and "esarey"
(or "es"
)
(Esarey and Sumner 2017). These usually return similar results. The major
difference is that "fdr"
can be slightly faster and more stable in edge
cases, however, confidence intervals are not updated. Only the p-values are
adjusted. "esarey"
is slower, but confidence intervals are updated as well.
ggeffects_test_engine
can be used as option to either use the marginaleffects
package for computing contrasts and comparisons (default), or the emmeans
package (e.g. options(ggeffects_test_engine = "emmeans")
). The latter is
useful when the marginaleffects package is not available, or when the
emmeans package is preferred. You can also provide the engine directly, e.g.
test_predictions(..., engine = "emmeans")
. Note that using emmeans as
backend is currently not as feature rich as the default (marginaleffects)
and still in development.
If engine = "emmeans"
, the test
argument can also be "interaction"
to calculate interaction contrasts (difference-in-difference contrasts),
"consecutive"
to calculate contrasts between consecutive levels of a predictor,
or a data frame with custom contrasts. If test
is one of the latter options,
and engine
is not specified, the engine
is automatically set to "emmeans"
.
If the marginaleffects package is not installed, the emmeans package is
used automatically. If this package is not installed as well,
engine = "ggeffects"
is used.
The verbose
argument can be used to display or silence messages and
warnings. Furthermore, options()
can be used to set defaults for the
print()
and print_html()
method. The following options are available,
which can simply be run in the console:
ggeffects_ci_brackets
: Define a character vector of length two, indicating
the opening and closing parentheses that encompass the confidence intervals
values, e.g. options(ggeffects_ci_brackets = c("[", "]"))
.
ggeffects_collapse_ci
: Logical, if TRUE
, the columns with predicted
values (or contrasts) and confidence intervals are collapsed into one
column, e.g. options(ggeffects_collapse_ci = TRUE)
.
ggeffects_collapse_p
: Logical, if TRUE
, the columns with predicted
values (or contrasts) and p-values are collapsed into one column, e.g.
options(ggeffects_collapse_p = TRUE)
. Note that p-values are replaced
by asterisk-symbols (stars) or empty strings when ggeffects_collapse_p = TRUE
,
depending on the significance level.
ggeffects_collapse_tables
: Logical, if TRUE
, multiple tables for
subgroups are combined into one table. Only works when there is more than
one focal term, e.g. options(ggeffects_collapse_tables = TRUE)
.
ggeffects_output_format
: String, either "text"
, "markdown"
or "html"
.
Defines the default output format from predict_response()
. If "html"
, a
formatted HTML table is created and printed to the view pane. "markdown"
creates a markdown-formatted table inside Rmarkdown documents, and prints
a text-format table to the console when used interactively. If "text"
or
NULL
, a formatted table is printed to the console, e.g.
options(ggeffects_output_format = "html")
.
ggeffects_html_engine
: String, either "tt"
or "gt"
. Defines the default
engine to use for printing HTML tables. If "tt"
, the tinytable package
is used, if "gt"
, the gt package is used, e.g.
options(ggeffects_html_engine = "gt")
.
Use options(<option_name> = NULL)
to remove the option.
Esarey, J., & Sumner, J. L. (2017). Marginal effects in interaction models: Determining and controlling the false positive rate. Comparative Political Studies, 1–33. Advance online publication. doi: 10.1177/0010414017730080
There is also an equivalence_test()
method in the parameters
package (parameters::equivalence_test.lm()
), which can be used to
test contrasts or comparisons for practical equivalence. This method also
has a plot()
method, hence it is possible to do something like:
library(parameters)
predict_response(model, focal_terms) |>
equivalence_test() |>
plot()