auc.roc.plot: AUC ROC Plot

Description

auc.roc.plot creates a ROC plot for one dataset and one or more model predictions. Prints AUC for each model as part of the legend. auc.roc.plot also includes an option to mark several types of optimal thresholds along each ROC plot.

Usage

auc.roc.plot(DATA, threshold = 101, find.auc = TRUE, 
which.model = (1:(ncol(DATA) - 2)), na.rm = FALSE, 
xlab = "1-Specificity (false positives)", 
ylab = "Sensitivity (true positives)", main = "ROC Plot", 
model.names = NULL, color = NULL, line.type = NULL, lwd = 1, 
mark = 0, mark.numbers = TRUE, mark.color = NULL, 
opt.thresholds = NULL, opt.methods = NULL, req.sens, 
req.spec, obs.prev = NULL, smoothing = 1, add.legend = TRUE, 
legend.text = model.names, legend.cex = 0.8, add.opt.legend = TRUE, 
opt.legend.text = NULL, opt.legend.cex = 0.7, 
counter.diagonal = FALSE, pch = NULL, FPC, FNC, cost.line = FALSE)

Value

creates a graphical plot

Arguments

DATA

a matrix or dataframe of observed and predicted values where each row represents one plot and where columns are:

`DATA[,1]`	plot ID	text
`DATA[,2]`	observed values	zero-one values
`DATA[,3]`	predicted probabilities from first model	numeric (between 0 and 1)
`DATA[,4]`	predicted probabilities from second model, etc...

threshold

cutoff values between zero and one used for translating predicted probabilities into 0 /1 values, defaults to 0.5. It can be a single value between zero and one, a vector of values between zero and one, or a positive integer representing the number of evenly spaced thresholds to calculate.

find.auc

a logical indicating if area under the curve should be calculated

which.model

a number indicating which model from DATA should be used

na.rm

a logical indicating whether missing values should be removed

xlab

a title for the x axis

ylab

a title for the y axis

main

an overall title for the plot

model.names

a vector of the names of each model included in DATA to be used in the legend box

color

should each model be plotted in a different color. It can be a logical value (where TRUE = color and FALSE = black and white), or a vector of color codes specifying particular colors for each line.

line.type

should each model be plotted in a different line type. It can be a logical value (where TRUE = dashed lines and FALSE = solid lines), or a vector of codes specifying particular line types for each line.

lwd

line width

mark

particular thresholds to mark along each roc plot, given in same format as threshold. Note: if optimal.thresholds = TRUE, argument mark will be ignored.

mark.numbers

a logical indication if the threshold values of each marked point along the ROC curved should be labeled next to the points

mark.color

should the marked thresholds be plotted in a different color for each model. A logical value where TRUE equals same colors as the lines, and FALSE = marks are always black. Can also be specified as a vector of color codes. Note that is this case, it is one color per model, not one color per threshold.

opt.thresholds

logical indicating whether the optimal thresholds should be calculated and plotted, or a vector specifying thresholds to plot

opt.methods

what methods should be used to optimize thresholds. Argument can be given either as a vector of method names or method numbers. Possible values are:

1	`Default`	threshold=0.5
2	`Sens=Spec`	sensitivity=specificity
3	`MaxSens+Spec`	maximizes (sensitivity+specificity)/2
4	`MaxKappa`	maximizes Kappa
5	`MaxPCC`	maximizes PCC (percent correctly classified)
6	`PredPrev=Obs`	predicted prevalence=observed prevalence
7	`ObsPrev`	threshold=observed prevalence
8	`MeanProb`	mean predicted probability
9	`MinROCdist`	minimizes distance between ROC plot and (0,1)
10	`ReqSens`	user defined required sensitivity
11	`ReqSpec`	user defined required specificity

req.sens

a value between zero and one giving the user defined required sensitivity. Only used if opt.thresholds = TRUE. Note that req.sens = (1-maximum allowable errors for points with positive observations).

req.spec

a value between zero and one giving the user defined required sspecificity. Only used if opt.thresholds = TRUE. Note that req.sens = (1- maximum allowable errors for points with negative observations).

obs.prev

observed prevalence for opt.method = "PredPrev=Obs" and "ObsPrev". Defaults to observed prevalence from DATA.

smoothing

smoothing factor for maximizing/minimizing. Only used if opt.thresholds = TRUE. Instead of find the threshold that gives the max/min value, function will average the thresholds of the given number of max/min values.

add.legend

a logical indicating if a legend for AUC lines should be added to plot

legend.text

a two item vector of text for presence/absence legend. Defaults to 'model.names'.

legend.cex

cex for AUC legend

add.opt.legend

logical indicating if a legend for optimal threshold criteria should be included on the plot

opt.legend.text

a vector of text for optimimal threshold criteria legend. Defaults to text corresponding to 'opt.methods'.

opt.legend.cex

cex for optimization criteria legend

counter.diagonal

should a counter-diagonal line be plotted. Note: each ROC plot crosses this line at the point where sensitivity equals specificity for that model.

pch

plotting "character", i.e., symbol to use for the thresholds specified in mark. pch can either be a single character or an integer code for one of a set of graphics symbols. See help(points) for details.

FPC

False Positive Costs, or for C/B ratio C = 'net costs of treating nondiseased individuals'.

FNC

False Negative Costs, or for C/B ratio B = 'net benefits of treating diseased individuals'.

cost.line

a logical indicating if the line representing the realtive cost ratio should be added to the plot.

Author

Elizabeth Freeman eafreeman@fs.fed.us

Details

Receiver Operating Curves (ROC plots) provide a threshold independent method of evaluating the performance of presence/absence models. In a ROC plot the true positive rate (sensitivity) is plotted against the false positive rate (1.0-specificity) as the threshold varies from 0 to 1. A good model will achieve a high true positive rate while the false positive rate is still relatively small; thus the ROC plot will rise steeply at the origin, and then level off at a value near the maximum of 1. The ROC plot for a poor model (whose predictive ability is the equivalent of random assignment) will lie near the diagonal, where the true positive rate equals the false positive rate for all thresholds. Thus the area under the ROC curve (AUC) is a good measure of overall model performance, with good models having an AUC near 1, while poor models have an AUC near 0.5.

mark can be used to mark particular thresholds along each ROC plot, alternativly, if optimal.thresholds = TRUE the function will find optimal thresholds by several criteria and plot them along each ROC curve.

See optimal.thresholds for more details on the optimization methods, and on the arguments ReqSens, ReqSpec, obs.prev smoothing, FPC, FNC, and cost.line.

Note: if too many methods are included in opt.methods, the graph will get very crowded.

Examples

Run this code


data(SIM3DATA)

auc.roc.plot(SIM3DATA)

auc.roc.plot(	SIM3DATA,
			opt.thresholds=TRUE,
			opt.methods=c("Default","Sens=Spec","MinROCdist"))

auc.roc.plot(	SIM3DATA,
			threshold=101,
			which.model=c(2,3),
			model.names=c("model a","model b","model c"),
			na.rm=TRUE,
			xlab="1-Specificity (false positives)",
			ylab="Sensitivity (true positives)",
			main="ROC Plot", 
			color=TRUE,
			line.type=TRUE,
			lwd=1,
			mark=0,
			mark.numbers=TRUE,
			opt.thresholds=TRUE,
			opt.methods=c(1,2,4),
			req.sens=0.85,
			req.spec=0.85,
			obs.prev=NULL,
			add.legend=TRUE,
			legend.text=NULL,
			legend.cex=0.8,
			add.opt.legend=TRUE,
			opt.legend.text=NULL,
			opt.legend.cex=0.7,
			counter.diagonal=TRUE,
			pch=NULL)

Run the code above in your browser using DataLab