Learn R Programming

lessR (version 1.5.2)

reg: Regression Analysis

Description

Automatically provides a comprehensive regression analysis by a single, simple function call with default settings in place. By default the data should be attached as a dataframe called mydata, such as provided by the rad function included in this package for reading and processing data in preparation for analysis.

The default analysis provides the model's parameter estimates and corresponding inferential analyses, goodness of fit, the ANOVA table, correlation matrix of the model's variables, analysis of residuals, and confidence and prediction intervals. By default the residual analysis lists the data and fitted value for each observation as well as the residual, Studentized residual and Cook's distance, with the first 25 observations listed and sorted by Cook's distance. The output for the confidence and prediction intervals also provides the data and fitted value for each observations, as well as the lower and upper bounds for each of the two intervals. The observations are sorted by the lower bound of each prediction interval. Also, for models with a single predictor variable, a scatterplot of the data is produced, along with the regression line and corresponding confidence and prediction intervals.

Overriding the default settings can turn off features and reduce the number of provided analyses.

Usage

reg(my.formula, dframe=mydata, graph=TRUE, cor=TRUE,
    res.rows=NULL, res.sort=c("cooks","rstudent","off"), 
    pred=TRUE, pred.sort=c("predint", "off"), sig.digits=NULL)

Arguments

my.formula
Standard R formula for specifying a model. For example, for a response variable named Y and two predictor variables, X1 and X2, specify the corresponding linear model as Y ~ X1 + X2.
dframe
Default is mydata, the name of the data frame that contains the data. The default name is consistent with the name given by the rad function for reading the data, also available in this pack
graph
Default is TRUE. If there is one predictor variable in the model, a scatterplot with regression line is produced. If prediction intervals are requested, both the confidence and prediction intervals are added to the graph.
cor
Default is TRUE, which prints a correlation matrix of the model variables.
res.rows
Default is 25, which lists the first 25 rows of data sorted by the specified sort criterion. To turn this option off, specify a value of 0. To see the output for all observations, specify a value of "all".
res.sort
Default is "cooks", for specifying Cook's distance as the sort criterion for the display of the rows of data and associated residuals. Other values are "rstudent" for Studentized residuals, and "off" to not pr
pred
Default is TRUE, which, produces confidence and prediction intervals for each row of data.
pred.sort
Default is "predint", which sorts the rows of data and associated intervals by the lower bound of each prediction interval. Turn off this sort by specifying a value of "off".
sig.digits
Provides the same functionality as the standard options function regarding the digits option. The distinction is that this value applies selectively to portions of the output, the different type of r

Details

Regression analysis automatically provides a variety of regression analyses. The basic analysis successively invokes three standard R functions lm, summary and confint. The residual analysis invokes fitted, resid, rstudent, and cooks.distance. The option for prediction intervals calls the standard R function predict, once with the argument interval="confidence" and once with interval="prediction". If there is only one predictor variable in the model, a scatterplot of the data with regression line is produced, along with the plotted confidence and prediction intervals.

The output for the residual analysis displays by default just the first 25 observations with the largest values of Cook's distance, sorted by this criterion. The output of the prediction intervals is re-organized so that each row's computed fitted value and prediction interval are listed adjacent to the corresponding values of the predictor variables and response variable. Each row of information, the data and corresponding intervals, is by default sorted by the lower bound of the prediction interval.

The options function is called to turn off the stars for different significance levels (show.signif.stars=FALSE) and to turn off scientific notation for the output (scipen=30).

The purpose of reg is to combine these function calls into one, and provide ancillary analyses such as sorting where appropriate to assist in interpretation.

See Also

formula, lm, summary.lm, anova, confint, fitted, resid, rstudent, cooks.distance

Examples

Run this code
# Generate random data
X1 <- rnorm(20)
X2 <- rnorm(20)
Y <- .7*X1 + .2*X2 + .6*rnorm(20)
mydata <- data.frame(cbind(X1, X2, Y))
attach(mydata)

# Call reg for a one-predictor regression
# Provide all default analyses including scatterplot etc.
reg(Y ~ X1)

# Call reg according to a multiple regression model
# Provide the full range of default analyses
reg(Y ~ X1 + X2)

# Call reg and modify the default settings as specified
reg(Y ~ X1 + X2, res.row=8, res.sort="rstudent", sig.digits=8, pred=FALSE)

Run the code above in your browser using DataLab