Learn R Programming

lessR (version 2.6)

Logit: Logit Analysis

Description

Abbreviation: lgt

Based directly on the standard R glm function with family="binomial", automatically provides a logit regression analysis with graphics from a single, simple function call with many default settings, each of which can be re-specified. By default the data exists as a data frame with the default name of mydata, such as data read by the lessR Read function. Specify the model in the function call according to an R formula, that is, the response variable followed by a tilde, followed by the list of predictor variables, each pair separated by a plus sign.

Default output includes the inferential analysis of the estimated coefficients and model, sorted residuals and Cook's Distance, and sorted fitted values for existing data or new data. The default output also includes two or three graphs beginning with a histogram of the residuals with superimposed normal and general density curves. The second graph is a scatterplot of the fitted values with the residuals and the corresponding lowess curve. The point corresponding to the largest value of Cook's Distance is labeled accordingly. Also provided, for a model with one predictor variable, is a scatterplot of the data with regression line and confidence and prediction intervals.

Can also be called from the more general model function. The resulting scatterplot, when written to a pdf file according to pdf=TRUE, is named RegScatterplot.pdf. If residuals are reported, then the two additional pdf files are named RegResiduals.pdf and RegResidFitted.pdf. Their names and the directory to which they are written are provided as part the console output.

Usage

Logit(my.formula, dframe=mydata, digits.d=4, text.width=120, 

res.rows=NULL, res.sort=c("cooks","rstudent","dffits","off"), pred=TRUE, pred.all=FALSE, pred.sort=TRUE, cooks.cut=1,

X1.new=NULL, X2.new=NULL, X3.new=NULL, X4.new=NULL, X5.new=NULL,

pdf=FALSE, pdf.width=5, pdf.height=5, ...)

lgt(...)

Arguments

my.formula
Standard R formula for specifying a model. For example, for a response variable named Y and two predictor variables, X1 and X2, specify the corresponding linear model as Y ~ X1 + X2.
dframe
The default name of the data frame that contains the data for analysis is mydata, otherwise explicitly specify.
digits.d
For the Basic Analysis, it provides the number of decimal digits. For the rest of the output, it is a suggestion only.
text.width
Width of the text output at the console.
res.rows
Default is 25, which lists the first 25 rows of data sorted by the specified sort criterion. To turn this option off, specify a value of 0. To see the output for all observations, specify a value of "all".
res.sort
Default is "cooks", for specifying Cook's distance as the sort criterion for the display of the rows of data and associated residuals. Other values are "rstudent" for Studentized residuals, and "off" t
pred
Default is TRUE, which, produces confidence and prediction intervals for each row of data.
pred.all
Default is FALSE, which produces prediction intervals only for the first, middle and last five rows of data.
pred.sort
Default is TRUE, which sorts the rows of data and associated intervals by the lower bound of each fitted value.
cooks.cut
Cutoff value of Cook's Distance at which observations with a larger value are flagged in red and labeled in the resulting scatterplot of Residuals and Fitted Values. Default value is 1.0.
X1.new
Values of the first listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.
X2.new
Values of the second listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.
X3.new
Values of the third listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.
X4.new
Values of the fourth listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.
X5.new
Values of the fifth listed predictor variable for which forecasted values and corresponding prediction intervals are calculated.
pdf
If TRUE, then graphics are written to pdf files.
pdf.width
Width of the pdf file in inches.
pdf.height
Height of the pdf file in inches.
...
Other parameter values for R function glm which provides the core computations.

Details

OVERVIEW The purpose of Logit is to combine the following function calls into one, as well as provide ancillary analyses such as as graphics, organizing output into tables and sorting to assist interpretation of the output.

The basic analysis successively invokes several standard R functions beginning with the standard R function for estimation of the logit model, glm with family="binomial". The output of the analysis is stored in the object lm.out, available for further analysis in the R environment upon completion of the Logit function. By default reg automatically provides the analyses from the standard R functions, summary, confint and anova, with some of the standard output modified and enhanced. The correlation matrix of the model variables is obtained with cor function. The residual analysis invokes fitted, resid, rstudent, and cooks.distance functions. The option for prediction intervals calls the standard generic R function predict. The lessR den function provides the histogram and density plots for the residuals and the ScatterPlot function provides the scatter plots of the residuals with the fitted values and of the data for the one-predictor model.

The default analysis provides the model's parameter estimates and corresponding hypothesis tests and confidence intervals, goodness of fit indices, the ANOVA table, analysis of residuals and influence as well as the fitted value and standard error for each observation in the model. The response variable must be binary with only numeric values of 0 and 1. See the examples of how obtain exclusive 0 and 1 values from character data.

DATA FRAME The name mydata is by default provided by the Read function included in this package for reading and displaying information about the data in preparation for analysis. If all the variables in the model are not in the same data frame, the analysis will not be complete. The data frame does not need to be attached, just specified by name with the dframe option if the name is not the default mydata.

GRAPHICS Two or three default graphs are provided. By default the graphs are written to separate graphics windows (which may overlap each other completely, in which case move the top graphics windows). Or, the graphics.save option may be invoked to save the graphs to a single pdf file called regOut.pdf. The directory to which the file is written is displayed on the console text output.

1. A histogram of the residuals includes the superimposed normal and general density plots from the den function included in this lessR package. The overlapping density plots, which both overlap the histogram, are filled with semi-transparent colors to enhance readability.

2. A scatterplot of the residuals with the fitted values is also provided from the ScatterPlot function included in this package. The point corresponding to the largest value of Cook's distance, regardless of its size, is plotted in red and labeled and the corresponding value of Cook's distance specified in the subtitle of the plot. Also by default all points with a Cook's distance value larger than 1.0 are plotted in red, a value that can be specified to any arbitrary value with the cooks.cut option. This scatterplot also includes the lowess curve.

3. For models with a single predictor variable, a scatterplot of the data is produced, which also includes the fitted values. As with the density histogram plot of the residuals and the scatterplot of the fitted values and residuals, the scatterplot includes a colored background with grid lines.

RESIDUAL ANALYSIS By default the residual analysis lists the data and fitted value for each observation as well as the residual, Studentized residual, Cook's distance and dffits, with the first 20 observations listed and sorted by Cook's distance. The residual displayed is the actual difference between fitted and observed, that is, with the setting in the residuals of type="response". The res.sort option provides for sorting by the Studentized residuals or not sorting at all. The res.rows option provides for listing these rows of data and computed statistics statistics for any specified number of observations (rows). To turn off the analysis of residuals, specify res.rows=0.

INVOKED R OPTIONS The options function is called to turn off the stars for different significance levels (show.signif.stars=FALSE), to turn off scientific notation for the output (scipen=30), and to set the width of the text output at the console to 120 characters. The later option can be re-specified with the text.width option. After reg is finished with a normal termination, the options are re-set to their values before the reg function began executing.

COLORS Individual colors in the plot can be manipulated with options such as col.bars for the color of the histogram bars. A color theme for all the colors can be chosen for a specific plot with the colors option with the lessR function set. The default color theme is blue, but a gray scale is available with "gray", and other themes are available as explained in set, such as "red" and "green". Use the option ghost=TRUE for a black backgound, no gridlines and partial transaparency of plotted colors.

VARIABLE LABELS Although standard R does not provide for variable labels, lessR can store the labels in a data frame called mylabels, obtained from the Read function. If this labels data frame exists, then the corresponding variable label is by default listed as the label for the horizontal axis and on the text output. For more information, see Read.

See Also

formula, glm, summary.glm, anova, confint, fitted, resid, rstudent, cooks.distance

Examples

Run this code
# obtain numeric 0,1 values from character data
# Gender has values of "M" and "F"
Read(lessR.data="Employee")
# convert factor to integer, values 1 and 2
# Female is 1 and Male is 2 (alphabetical)
Transform(Gender=as.numeric(Gender))
# so create a new variable with numeric 0 and 1
# Male is 0, Female is 1
Recode(Gender, old=c(1,2), new=c(1,0))
# proceed with the logit regression
Logit(Gender ~ Years)

# short name
lgt(Gender ~ Years)

# Modify the default settings as specified
Logit(Gender ~ Years, res.row=8, res.sort="rstudent", digits.d=8, pred=FALSE)

# Multiple logistic regression model
# Provide all default analyses
Logit(Gender ~ Years + Salary)

# Save the three plots as pdf files 4 inches square, gray scale
Logit(Gender ~ Years, pdf=TRUE, pdf.width=4, pdf.height=4, colors="gray")

# Specify new values of the predictor variables to calculate
#  forecasted values
# Specify an input data frame other than mydata
Read(lessR.data="Cars93", dframe=cars)
Logit(Source ~ HP + MidPrice, dframe=cars,
      X1.new=seq(100,250,50), X2.new=c(10,60,10))

Run the code above in your browser using DataLab