These functions help in creating a set of plots based on the real data and some modification that makes the null hypothesis true. The user then tries to choose which graph represents the real data.
vis.test(..., FUN, nrow=3, ncol=3, npage=3, data.name = "", alternative)
vt.qqnorm(x, orig=TRUE)
vt.normhist(x, ..., orig=TRUE)
vt.scatterpermute(x, y, ..., orig=TRUE)
vt.tspermute(x, type='l', ..., orig=TRUE)
vt.residpermute(model, ..., orig=TRUE)
vt.residsim(model, ..., orig=TRUE)
The vis.test
function returns an object of class htest
with the following components:
The string "Visual Test"
The name of the data passed to the function
The number of correct "guesses"
The p-value based on the number of correct "guesses"
The number of rows per page
The number of columns per page
The number of pages
A list with 3 vectors containing the seeds set before
calling FUN
, the correct plot has an NA
A vector of length npage
indicating the number of the
figure picked in each of the npage
tries
The other functions are run for their side effects and do not return anything meaningful.
data and arguments to be passed on to FUN
or to
plotting functions, see details below
The function to create the plots on the original or null hypothesis data
The number of rows of graphs per page
The number of columns of graphs per page
The number of pages to use in the testing
Optional character string for the name of the data in the output
Optional character string for the alternative hypothesis in the output
Logical, should the original data be plotted, or data based on the null hypothesis
data or x-coordinates of the data
y-coordinates of the data
type of plot, passed on to plot function (use 'p' for points)
An lm
object, or any model object for which
fitted
and resid
return vectors
Greg Snow 538280@gmail.com
The p-value is based on the assumption that under the
null hypothesis there is a 1/(nrow
*ncol
) chance of
picking the correct plot
and that the npage
choices are independent of each other. This
may not be
true if the user is familiar with the data or remembers details of the
plot between picks.
The vis.test
function will create a nrow
by ncol
grid of plots, one
of which is based on the real (original) data and the others which
are based on a null hypothesis simulation (a statistical "lineup").
The real plot is placed at
random within the set. The user then clicks on their best guess
of which plot is the real one (the most different from the others).
If the null hypothesis is true for the real data, then this will be a
guess with a 1/(nrow
*ncol
) probability of success. This
process is then
repeated for a total of npage
times. A p-value is then
constructed based on the
number of correct guesses and the null hypothesis that
there is a 1/(nrow
*ncol
) chance of guessing correct each
time (this will work
best if the person doing the choosing has not already seen
plots/summaries of the data).
If the plotting function (FUN
) is not passed as a named
argument, then the first argument (in the ...) that is a function
will be used. If no functions are passed then the function will stop
with an error.
The plotting function (FUN
) can be an existing function or a
user supplied function. The function must have an argument named
"orig" which indicates whether to plot the original data or the null
hypothesis data. A new seed will be set before each call to
FUN
except when orig
is TRUE
. Inside the
function if orig
is TRUE
then the function should plot
the original data. When orig
is FALSE
then the function
should do some form of simulation based on the data with the null
hypothesis true and plot the simulated data (making sure to give no
signs that it is different from the original plot).
The return object includes a list with the seeds set before each of
the plots (NA
for the original data plot) and a vector of the
plots selected by the user. This information can be used to recreate
the simulated plots by setting the seed then calling FUN
.
The vt.qqnorm
function tests the null hypothesis that a vector
of data comes from a normal distribution (or at least pretty close) by
creating a qqnorm
plot of the original data, or the same plot
of random data from a normal distribution with the same mean and
standard deviation as the original data.
The vt.normhist
function tests the null hypothesis that a
vector of data comes from a normal distribution (or at least pretty
close) by plotting a histogram with a reference line representing a
normal distribution of either the original data or a set of random
data from a normal distribution with the same mean and standard
deviation as the original.
The vt.scatterpermute
function tests the null hypothesis of "no
relationship" between 2 vectors of data. When orig
is TRUE
the
function creates a scatterplot of the 2 variables, when orig
is
FALSE
the function first permutes the y variable randomly
(making no relationship) then creates a scatter plot with the original
x and permuted y variables.
The vt.tspermute
function creates a time series type plot of a
single vector against its index. When orig
is false, the
vector is permuted before plotting.
The vt.residpermute
function takes a regression object (class
lm, or any model type object for which fitted
and resid
return vectors) and does a residual plot of the fitted values on the x
axis and residuals on the y axis. The loess smooth curve
(scatter.smooth
is the plotting function) and a reference line
at 0 are included. When orig
is FALSE
the residuals are
randomly permuted before being plotted.
The vt.residsim
function takes a regression object (class lm,
or any model type object for which fitted
and resid
return vectors) and does a residual plot of the fitted values on the x
axis and residuals on the y axis. The loess smooth curve
(scatter.smooth
is the plotting function) and a reference line
at 0 are included. When orig
is FALSE
the residuals are
simulate from a normal distribution with mean 0 and standard deviation
the same as the residuals.
Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne, D.F and Wickham, H. (2009) Statistical Inference for exploratory data analysis and model diagnostics Phil. Trans. R. Soc. A 2009 367, 4361-4383 doi: 10.1098/rsta.2009.0120
if(interactive()) {
x <- rexp(25, 1/3)
vis.test(x, vt.qqnorm)
x <- rnorm(100, 50, 3)
vis.test(x, vt.normhist)
}
Run the code above in your browser using DataLab