This function will return a list of lists where the top-level keys (names) of the items indicate
the component of the full model (i.e. the term) that the generated models can be used to test. At
each of these keys is a list with both the complex
and simple models
that can be compared to
test the component. The complex
models always include the target term, and the simple
models
are identical to the complex
except the target term is removed. Thus, when the models are
compared (e.g. using anova
, except for Type III; see details below), the resulting values
will show the effect of adding the target term to the model. There are three generally used
approaches to determining what the appropriate comparison models should be, called Type I, II,
and III. See the sections below for more information on these types.
generate_models(model, type = 3)# S3 method for formula
generate_models(model, type = 3)
# S3 method for lm
generate_models(model, type = 3)
A list of the augmented models for each term, where the associated term is the key for each model in the list.
For Type I SS, or sequential SS, each term is considered in order after the preceding terms are considered. Consider the example model
Y ~ A + B + A:B
, where ":" indicates an interaction. To determine the Type I effect of A
, we would compare the
model Y ~ A
to the same model without the term: Y ~ NULL
. For B
, we compare Y ~ A + B
to
Y ~ A
; and for A:B
, we compare Y ~ A + B + A:B
to Y ~ A + B
. Incidentally, the anova()
function that ships with the base installation of R computes Type I statistics.
For Type II SS, or hierarchical SS, each term is considered in the presence of all of the terms that do not include it. For example, consider an example three-way factorial model
Y ~ A + B + C + A:B + A:C + B:C + A:B:C
, where ":" indicates an interaction. The effect of A
is found by comparing Y ~ B + C + B:C + A
to Y ~ B + C + B:C
(the only terms included are those that do not include A
). For B
, the
comparison models would be Y ~ A + C + A:C + B
and Y ~ A + C + A:C
; for A:B
, the models
would be Y ~ A + B + C + A:C + B:C + A:B
and Y ~ A + B + C + A:C + B:C
; and so on.
For Type III SS, or orthogonal SS, each term is considered in the presence of all of the other terms. For example, consider an example two-way factorial model
Y ~ A + B + A:B
, where :
indicates an interaction between the terms. The effect of A
, is found by comparing
Y ~ B + A:B + A
to Y ~ B + A:B
; for B
, the comparison models would be Y ~ A + A:B + B
and
Y ~ A + A:B
; and for A:B
, the models would be Y ~ A + B + A:B
and Y ~ A + B
.
Unfortunately, anova()
cannot be used to compare Type III models. anova()
does not allow for
violation of the principle of marginality, which is the rule that interactions should only be
tested in the context of their lower order terms. When an interaction term is present in a model,
anova()
will automatically add in the lower-order terms, making a model like Y ~ A + A:B
unable to be compared: it will add the lower-order term B
,and thus use the model Y ~ A + B + A:B
instead. To get the appropriate statistics for Type III comparisons, use drop1()
with the
full scope, i.e. drop1(model_fit, scope = . ~ .)
.
# create all type 2 comparison models
mod <- lm(Thumb ~ Height * Sex, data = Fingers)
mods_2 <- generate_models(mod, type = 2)
# compute the SS for the Height term
mod_Height <- anova(mods_2[["Height"]]$simple, mods_2[["Height"]]$complex)
mod_Height[["Sum of Sq"]][[2]]
Run the code above in your browser using DataLab