Abbreviation: av
, av.brief
Analysis of variance from the R aov
function plus graphics and effect sizes. Permitted designs are one-way between groups, two-way between groups and randomized blocks with one treatment factor with one observation for each treatment and block combination.
Output is generated into distinct segments by topic, organized and displayed in sequence by default. When the output is assigned to an object, such as a
in a <- reg(Y ~ X)
, the full or partial output can be accessed for later analysis and/or viewing. A primary such analysis is with knitr
for dynamic report generation, run from R directly or from within RStudio
. The input instructions to knitr
are written comments and interpretation with embedded R
code, called R~Markdown. Doing a knitr
analysis is to "knit" these comments and subsequent output together so that the R
output is embedded in the resulting document, either html, pdf or Word, by default with explanation and interpretation. Generate a complete, though preliminary at this time, R Markdown document from the Rmd
option ready to knit. Simply specify the option with a file name, run the ANOVA function to create the file. Then open the newly created .Rmd
file in RStudio
and click the knit
button to create a formatted document that consists of the statistical results and interpretative comments. See the sections arguments
, value
and examples
for more information.
ANOVA(my.formula, data=mydata, rows=NULL,
brief=getOption("brief"), digits.d=NULL,
Rmd=NULL, graphics=TRUE,
rb.points=TRUE, res.rows=NULL, res.sort=c("zresid", "fitted", "off"),
pdf=FALSE, width=5, height=5, fun.call, …)av(…)
av.brief(…, brief=TRUE)
Standard R formula
for specifying a model.
The default name of the data frame that contains the data for analysis
is mydata
, otherwise explicitly specify.
A logical expression that specifies a subset of rows of the data frame to analyze.
If set to TRUE
, reduced text output with no Tukey multiple
comparison of means and no residuals. Can change system default
with style
function.
For the Basic Analysis, it provides the number of decimal digits. For the rest of the output, it is a suggestion only.
File name for the file of R Markdown instructions to be written, if specified. The file type is .Rmd, which automatically opens in RStudio, but it is a simple text file that can be edited with any text editor, including RStudio.
Produce graphics. Default is TRUE
. In Rmd
can
be useful to set to FALSE
so that regPlot
can be used
to place the graphics within the output file.
For a randomized block design, a plot of the fitted value
for each cell is obtained as well as the individual data values. Set to
FALSE
to suppress the data values.
Default is 20, which lists the first 20 rows of data and residuals
sorted by the specified sort criterion. To disable residuals, specify a
value of 0. To see the residuals output for all observations, specify a
value of "all"
.
Default is "zresid"
, for specifying standardized residuals
as the sort criterion for the display of the rows of data and associated
residuals. Other values are "fitted"
for the fitted values and
"off"
to not sort the rows of data.
Indicator as to if the graphic files should be saved as pdf files instead of directed to the standard graphics windows.
Width of the pdf file in inches.
Height of the pdf file in inches.
Function call. Used with Rmd
to pass the function call when
obtained from the abbreviated function call av
.
Other parameter values for R function lm
which provides the core computations.
The output can optionally be returned and saved into an R
object, otherwise it simply appears at the console. The components of this object are redesigned in lessR
version 3.3.5 into (a) pieces of text that form the readable output and (b) a variety of statistics. The readable output are character strings such as tables amenable for viewing and interpretation. The statistics are numerical values amenable for further analysis, such as to be referenced in a subsequent R Markdown document. The motivation of these two types of output is to facilitate R markdown documents, as the name of each piece, preceded by the name of the saved object followed by a $, can be inserted into the R markdown document (see examples
).
TEXT OUTPUT
out_background
: variables in the model, rows of data and retained
1-predictor: out_descriptive
: descriptive stats
2-predictors: out_cell.n
: cell sample size
2-predictors: out_cell.means
: cell means
2-predictors: out_cell.marginals
: marginal means
2-predictors: out_cell.gm
: grand mean
2-predictors: out_cell.sd
: cell standard deviations
out_anova
: analysis of variance summary table
out_effects
: effect sizes
out_hsd
: Tukey's honestly significant different analysis
out_res
: residuals
out_plots
: list of plots generated if more than one
Separated from the rest of the text output are the major headings, which can then be deleted from custom collations of the output.
out_title_bck
: BACKGROUND
out_title_des
: DESCRIPTIVE STATISTICS
out_title_basic
: BASIC ANALYSIS
out_title_res
: RESIDUALS
STATISTICS
call
: function call that generated the analysis
formula
: model formula that specifies the model
n.vars
: number of variables in the model
n.obs
: number of rows of data submitted for analysis
n.keep
: number of rows of data retained in the analysis
residuals
: residuals
fitted
: fitted values
Although not typically needed for analysis, if the output is assigned to an object named, for example, a
, then the complete contents of the object can be viewed directly with the unclass
function, here as unclass(a)
. Invoking the class
function on the saved object reveals a class of out_all. The class of each of the text pieces of output is out_piece.
OVERVIEW
The one-way ANOVA with Tukey HSD and corresponding plot is based on the R functions aov
, TukeyHSD
, and provides summary statistics for each level. Two-factor ANOVA also provides an interaction plot of the means with interaction.plot
as well as a table of means and other summary statistics. The two-factor analysis can be between groups or a randomized blocked design. Residuals are displayed by default. Tukey HSD comparisons and residuals are not displayed if brief=TRUE
.
The rows
parameter subsets rows (cases) of the input data frame according to a logical expression. Use the standard R operators for logical statements as described in Logic
such as &
for and, |
for or and !
for not, and use the standard R relational operators as described in Comparison
such as ==
for logical equality !=
for not equals, and >
for greater than. See the Examples.
MODEL SPECIFICATION
In the following specifications, Y is the response variable, X is a treatment variable and Blocks is the blocking variable. The distinction between the one-way randomized blocks and the two-way between groups models is not the variable names, but rather the delimiter between the variable names. Use *
to indicate a two-way crossed between groups design and +
for a randomized blocks design.
one-way between groups: ANOVA(Y ~ X)
one-way randomized blocks: ANOVA(Y ~ X + Blocks)
two-way between groups: ANOVA(Y ~ X1 * X2)
For more complex designs, use the standard R function aov
upon which ANOVA
depends.
BALANCED DESIGN
The design for the two-factor analyses must be balanced. A check is performed and processing ceases if not balanced. For unbalanced designs, consider the function lmer
in the lme4
package.
DECIMAL DIGITS
The number of decimal digits displayed on the output is, by default, the maximum number of decimal digits for all the data values of the response variable. Or, this value can be explicitly specified with the digits.d
parameter.
Gerbing, D. W. (2014). R Data Analysis without Programming, Chapter 7, NY: Routledge.
# NOT RUN {
# access the PlantGrowth data frame
ANOVA(weight ~ group, data=PlantGrowth)
#brief version
av.brief(weight ~ group, data=PlantGrowth)
# drop the second treatment, just control and 1 treatment
ANOVA(weight ~ group, data=PlantGrowth, rows=(group != "trt2"))
# variables of interest in a data frame that is not the default mydata
# two-factor between-groups ANOVA with replications and interaction
# warpbreaks is a data set provided with R
ANOVA(breaks ~ wool * tension, data=warpbreaks)
# randomized blocks design with the second term the blocking factor
# use short name
av(breaks ~ wool + tension, data=warpbreaks)
# }
Run the code above in your browser using DataLab