emplogitplot2: Empirical logit plot for one quantitative variable by categorical groups

Description

This function produces an empirical logit plot for a binary response variable and with a single quantitative predictor variable broken down by a single categorical factor.

Usage

emplogitplot2(formula, data = NULL, ngroups = 3, breaks = NULL,
  yes = NULL, padj = TRUE, out = FALSE, showplot = TRUE,
  showline = TRUE, ylab = "Log(Odds)", xlab = NULL,
  putlegend = "n", levelcol = NULL, pch = NULL, main = "",
  ylim = NULL, xlim = NULL, lty = NULL, lwd = 1, cex = 1)

Arguments

formula

A formula of the form (binary) Response~Quantitative Predictor+Factor

data

A dataframe

ngroups

Number of groups to use (not needed if breaks is used), ngroups="all" uses all unique values

breaks

A vector of endpoints for the bins (not needed if ngroups is used)

yes

Set a value for the response to be counted for proportions (optional)

padj

Should proportions be adjusted to avoid zero and one? (default is TRUE)

out

Should the function return a dataframe with group and factor information? (default is FALSE)

showplot

Show the plot? default is TRUE

showline

Show the regression lines? default is TRUE

ylab

Text label for the vertical axis (default is "Log(Odds)")

xlab

Text label for the horizontal axis (default is NULL)

putlegend

Position for the legend (default is "n" for no legend)

levelcol

Vector of colors for the factor levels

pch

Plot character for the dots

main

Title for plot

ylim

Limits for the vertical axis

xlim

Limits for the horizontal axis

lty

Line type (default is 1)

lwd

Line width (default is 1)

cex

Multiplier for plot symbols

Value

A dataframe with group information (if out=TRUE)

Details

Values of the quantitative explanatory variable will be grouped into ngroups roughly equal sized groups, unless breaks is used to determine the boundaries of the groups. Using ngroups="all" will make each distinct value of the explanatory variable its own group

We find a proportion for the binary response variable within each of the groups created from the quantitative variable crossed with the categrical variable. To avoid problems with proportions of zero and one, we compute an adjusted proportion with (Number yes +0.5)/(Number of cases+1). This is converted to an adjusted log odds log(adjp/(1-adjp)). What constitutes a "success" can be specified with yes= and the proportion adjustment can be turned off (if no group proportions are likely to be zero or one) with padj=FALSE.

The function plots the log odds versus the mean of the explanatory variable within each group with different colors for each of the categories defined by the categorical variable. A least square line is fit to these points within each categorical group. The plot can be suppressed with showplot=FALSE.

The out=TRUE option will return a dataframe with the boundaries of each group, proportion, adjusted proportion, mean explanatory variable, and (adjusted or unadjusted) log odds.

Examples

Run this code

# NOT RUN {
data(MedGPA)
emplogitplot2(Acceptance~GPA+Sex,data=MedGPA)

GroupTable2=emplogitplot2(Acceptance~MCAT+Sex,ngroups=5,out=TRUE,data=MedGPA,putlegend="topleft")

emplogitplot2(Acceptance~MCAT+Sex,data=MedGPA,breaks=c(0,34.5,39.5,50.5),
              levelcol=c("red","blue"),putlegend="bottomright")
# }

Run the code above in your browser using DataLab