generate_models: Generate a List of Models for Computing Different Types of Sums of Squares

Description

This function will return a list of lists where the top-level keys (names) of the items indicate the component of the full model (i.e. the term) that the generated models can be used to test. At each of these keys is a list with both the complex and simple models that can be compared to test the component. The complex models always include the target term, and the simple models are identical to the complex except the target term is removed. Thus, when the models are compared (e.g. using anova, except for Type III; see details below), the resulting values will show the effect of adding the target term to the model. There are three generally used approaches to determining what the appropriate comparison models should be, called Type I, II, and III. See the sections below for more information on these types.

Usage

generate_models(model, type)

Arguments

model

The model to generate the models from, of the type lm, aov, or formula.

type

The type of sums of squares to calculate:

Use 1, I, and sequential for Type I.
Use 2, II, and hierarchical for Type II.
Use 3, III, and orthogonal for Type III.

Value

A list of the augmented models for each term, where the associated term is the key for each model in the list.

Type I

For Type I SS, or sequential SS, each term is considered in order after the preceding terms are considered. Consider the example model Y ~ A + B + A:B, where ":" indicates an interaction. To determine the Type I effect of A, we would compare the model Y ~ A to the same model without the term: Y ~ NULL. For B, we compare Y ~ A + B to Y ~ A; and for A:B, we compare Y ~ A + B + A:B to Y ~ A + B. Incidentally, the anova function that ships with the base installation of R computes Type I statistics.

Type II

For Type II SS, or hierarchical SS, each term is considered in the presence of all of the terms that do not include it. For example, consider an example three-way factorial model Y ~ A + B + C + A:B + A:C + B:C + A:B:C, where ":" indicates an interaction. The effect of A is found by comparing Y ~ B + C + B:C + A to Y ~ B + C + B:C (the only terms included are those that do not include A). For B, the comparison models would be Y ~ A + C + A:C + B and Y ~ A + C + A:C; for A:B, the models would be Y ~ A + B + C + A:C + B:C + A:B and Y ~ A + B + C + A:C + B:C; and so on.

Type III

For Type III SS, or orthogonal SS, each term is considered in the presence of all of the other terms. For example, consider an example two-way factorial model Y ~ A + B + A:B, where ":" indicates an interaction. The effect of A is found by comparing Y ~ B + A:B + A to Y ~ B + A:B; for B, the comparison models would be Y ~ A + A:B + B and Y ~ A + A:B; and for A:B, the models would be Y ~ A + B + A:B and Y ~ A + B.

Unfortunately, anova() cannot be used to compare Type III models. anova() does not allow for violation of the principle of marginality, which is the rule that interactions should only be tested in the context of their lower order terms. When an interaction term is present in a model, anova() will automatically add in the lower-order terms, making a model like Y ~ A + A:B unable to be compared: it will add the lower-order term B,and thus use the model Y ~ A + B + A:B instead. To get the appropriate statistics for Type III comparisons, use drop1() with the full scope, i.e. drop1(model_fit, scope = . ~ .).

Examples

Run this code

# NOT RUN {
# create all type 2 comparison models
mod <- lm(Thumb ~ Height * Sex, data = Fingers)
mods_2 <- generate_models(mod, type = 2)

# compute the SS for the Height term
mod_Height <- anova(mods_2[["Height"]]$simple, mods_2[["Height"]]$complex)
mod_Height[["Sum of Sq"]][[2]]

# }

Run the code above in your browser using DataLab