lgt
Based directly on the standard R glm
function with family="binomial"
, automatically provides a logit regression analysis with graphics from a single, simple function call with many default settings, each of which can be re-specified. By default the data exists as a data frame with the default name of mydata
, such as data read by the lessR
Read
function. Specify the model in the function call according to an R formula
, that is, the response variable followed by a tilde, followed by the list of predictor variables, each pair separated by a plus sign.
Default output includes the inferential analysis of the estimated coefficients and model, sorted residuals and Cook's Distance, and sorted fitted values for existing data or new data. The default output also includes two or three graphs beginning with a histogram of the residuals with superimposed normal and general density curves. The second graph is a scatterplot of the fitted values with the residuals and the corresponding lowess
curve. The point corresponding to the largest value of Cook's Distance is labeled accordingly. Also provided, for a model with one predictor variable, is a scatterplot of the data with regression line and confidence and prediction intervals.
Can also be called from the more general model
function.
The resulting scatterplot, when written to a pdf file according to pdf=TRUE
, is named RegScatterplot.pdf. If residuals are reported, then the two additional pdf files are named RegResiduals.pdf and RegResidFitted.pdf. Their names and the directory to which they are written are provided as part the console output.
Logit(my.formula, dframe=mydata, digits.d=4, text.width=120, res.rows=NULL, res.sort=c("cooks","rstudent","dffits","off"),
pred=TRUE, pred.all=FALSE, pred.sort=TRUE, cooks.cut=1,
X1.new=NULL, X2.new=NULL, X3.new=NULL, X4.new=NULL,
X5.new=NULL,
pdf=FALSE, pdf.width=5, pdf.height=5, ...)
lgt(...)
formula
for specifying a model. For
example, for a response variable named Y and two predictor variables, X1 and
X2, specify the corresponding linear model as Y ~ X1 + X2.mydata
, otherwise explicitly specify."all"
."cooks"
, for specifying Cook's distance as the sort
criterion for the display of the rows of data and associated residuals. Other values
are "rstudent"
for Studentized residuals, and "off"
tTRUE
, which, produces confidence and prediction intervals
for each row of data.FALSE
, which produces prediction intervals only for the
first, middle and last five rows of data.TRUE
, which sorts the rows of data and associated
intervals by the lower bound of each fitted value.TRUE
, then graphics are written to pdf files.glm
which provides the core computations.Logit
is to combine the following function calls into one, as well as provide ancillary analyses such as as graphics, organizing output into tables and sorting to assist interpretation of the output. The basic analysis successively invokes several standard R functions beginning with the standard R function for estimation of the logit model, glm
with family="binomial"
. The output of the analysis is stored in the object lm.out
, available for further analysis in the R environment upon completion of the Logit
function. By default reg
automatically provides the analyses from the standard R functions, summary
, confint
and anova
, with some of the standard output modified and enhanced. The correlation matrix of the model variables is obtained with cor
function. The residual analysis invokes fitted
, resid
, rstudent
, and cooks.distance
functions. The option for prediction intervals calls the standard generic R function predict
. The lessR
den
function provides the histogram and density plots for the residuals and the ScatterPlot
function provides the scatter plots of the residuals with the fitted values and of the data for the one-predictor model.
The default analysis provides the model's parameter estimates and corresponding hypothesis tests and confidence intervals, goodness of fit indices, the ANOVA table, analysis of residuals and influence as well as the fitted value and standard error for each observation in the model. The response variable must be binary with only numeric values of 0 and 1. See the examples of how obtain exclusive 0 and 1 values from character data.
DATA FRAME
The name mydata
is by default provided by the Read
function included in this package for reading and displaying information about the data in preparation for analysis. If all the variables in the model are not in the same data frame, the analysis will not be complete. The data frame does not need to be attached, just specified by name with the dframe
option if the name is not the default mydata
.
GRAPHICS
Two or three default graphs are provided. By default the graphs are written to separate graphics windows (which may overlap each other completely, in which case move the top graphics windows). Or, the graphics.save
option may be invoked to save the graphs to a single pdf file called regOut.pdf
. The directory to which the file is written is displayed on the console text output.
1. A histogram of the residuals includes the superimposed normal and general density plots from the den
function included in this lessR
package. The overlapping density plots, which both overlap the histogram, are filled with semi-transparent colors to enhance readability.
2. A scatterplot of the residuals with the fitted values is also provided from the ScatterPlot
function included in this package. The point corresponding to the largest value of Cook's distance, regardless of its size, is plotted in red and labeled and the corresponding value of Cook's distance specified in the subtitle of the plot. Also by default all points with a Cook's distance value larger than 1.0 are plotted in red, a value that can be specified to any arbitrary value with the cooks.cut
option. This scatterplot also includes the lowess
curve.
3. For models with a single predictor variable, a scatterplot of the data is produced, which also includes the fitted values. As with the density histogram plot of the residuals and the scatterplot of the fitted values and residuals, the scatterplot includes a colored background with grid lines.
RESIDUAL ANALYSIS
By default the residual analysis lists the data and fitted value for each observation as well as the residual, Studentized residual, Cook's distance and dffits, with the first 20 observations listed and sorted by Cook's distance. The residual displayed is the actual difference between fitted and observed, that is, with the setting in the residuals
of type="response"
. The res.sort
option provides for sorting by the Studentized residuals or not sorting at all. The res.rows
option provides for listing these rows of data and computed statistics statistics for any specified number of observations (rows). To turn off the analysis of residuals, specify res.rows=0
.
INVOKED R OPTIONS
The options
function is called to turn off the stars for different significance levels (show.signif.stars=FALSE), to turn off scientific notation for the output (scipen=30), and to set the width of the text output at the console to 120 characters. The later option can be re-specified with the text.width
option. After reg
is finished with a normal termination, the options are re-set to their values before the reg
function began executing.
COLORS
Individual colors in the plot can be manipulated with options such as col.bars
for the color of the histogram bars. A color theme for all the colors can be chosen for a specific plot with the colors
option with the lessR
function set
. The default color theme is blue
, but a gray scale is available with "gray"
, and other themes are available as explained in set
, such as "red"
and "green"
. Use the option ghost=TRUE
for a black backgound, no gridlines and partial transaparency of plotted colors.
VARIABLE LABELS
Although standard R does not provide for variable labels, lessR
can store the labels in a data frame called mylabels
, obtained from the Read
function. If this labels data frame exists, then the corresponding variable label is by default listed as the label for the horizontal axis and on the text output. For more information, see Read
.
formula
, glm
, summary.glm
, anova
, confint
, fitted
, resid
, rstudent
, cooks.distance
# obtain numeric 0,1 values from character data
# Gender has values of "M" and "F"
Read(lessR.data="Employee")
# convert factor to integer, values 1 and 2
# Female is 1 and Male is 2 (alphabetical)
Transform(Gender=as.numeric(Gender))
# so create a new variable with numeric 0 and 1
# Male is 0, Female is 1
Recode(Gender, old=c(1,2), new=c(1,0))
# proceed with the logit regression
Logit(Gender ~ Years)
# short name
lgt(Gender ~ Years)
# Modify the default settings as specified
Logit(Gender ~ Years, res.row=8, res.sort="rstudent", digits.d=8, pred=FALSE)
# Multiple logistic regression model
# Provide all default analyses
Logit(Gender ~ Years + Salary)
# Save the three plots as pdf files 4 inches square, gray scale
Logit(Gender ~ Years, pdf=TRUE, pdf.width=4, pdf.height=4, colors="gray")
# Specify new values of the predictor variables to calculate
# forecasted values
# Specify an input data frame other than mydata
Read(lessR.data="Cars93", dframe=cars)
Logit(Source ~ HP + MidPrice, dframe=cars,
X1.new=seq(100,250,50), X2.new=c(10,60,10))
Run the code above in your browser using DataLab