AUC: Decision Theory measures of specificity, sensitivity, and d prime

Description

In many fields, decisions and outcomes are categorical even though the underlying phenomenon are probably continuous. E.g. students are accepted to graduate school or not, they finish or not. X-Rays are diagnosed as patients having cancer or not. Outcomes of such decisions are usually labeled as Valid Positives, Valid Negatives, False Positives and False Negatives. In hypothesis testing, False Positives are known as Type I errors, while False Negatives are Type II errors. The relationship between these four cells depends upon the correlation between the decision rule and the outcome as well as the level of evidence needed for a decision (the criterion). Signal Detection Theory and Decision Theory have a number of related measures of performance (accuracy = VP + VN), Sensitivity (VP/(VP + FN)), Specificity (1 - FP), d prime (d'), and the area under the Response Operating Characteristic Curve (AUC). More generally, these are examples of correlations based upon dichotomous data. AUC addresses some of these questions.

Usage

AUC(t=NULL,BR=NULL,SR=NULL,Phi=NULL,VP=NULL,labels=NULL,plot="b",zero=TRUE,correct=.5, 
     col=c("blue","red"))

Arguments

a 4 x 1 vector or a 2 x2 table of TP, FP, FN, TN values (see below) May be counts or proportions.

Base Rate of successful outcomes or actual symptom (if t is not specified)

Selection Rate for candidates or diagnoses (if t is not specified)

Phi

The Phi correlation coefficient between the predictor and the outcome variable (if t is not specified)

The number of Valid Positives (selected applicants who succeed; correct diagnoses).(if t and Phi are not specified)

labels

Names of variables 1 and 2

plot

"b" (both), "d" (decision theory), "a" (auc), or "n" neither

zero

If True, then the noise distribution is centered at zero

correct

Cell values of 0 are replaced with correct. (See tetrachoric for a discussion of why this is needed.)

col

The color choice for the VP and FP, defaults to =c("blue","red") but could be c("grey","black") if we want to avoid colors

Value

phi

Phi coefficient of the two by two table

tetra

Tetrachoric (latent) coefficient inferred from the two by two table

r.bis

Biserial correlation of continuous state of world with decision

observed

The observed input (as a check)

probabilities

Observed values/ total number of observations

conditional

prob / rowSums(prob)

Accuracy

percentage of True Positives + True Negatives

Sensitivity

VP/(VP + FN)

Specificity

VN/(FP + VN)

d.prime

difference of True Positives versus True Negatives

beta

ratio of ordinates at the decision point

Details

The problem of making binary decisions about the state of the world is ubiquitous. We see this in Null Hypothesis Significance Testing (NHST), medical diagnoses, and selection for occupations. Variously known as NHST, Signal Detection Theory, clinical Assessment, or college admissions, all of these domains share the same two x two decision task.

Although the underlying phenomena are probably continuous, a typical decision or diagnostic situation makes dichotomous decisions: Accept or Reject, correctly identified, incorrectly identified. In Signal Detection Theory, the world has two states: Noise versus Signal + Noise. The decision is whether there is a signal or not.

In diagnoses, it is whether to diagnose an illness or not given some noisy signal (e.g., an X-Ray, a set of diagnostic tests).

In college admissions, we accept some students and reject others. Four-Five years later we observe who "succeeds" or graduates.

All of these decisions lead to four cells based upon a two x two categorization. Given the true state of the world is Positive or Negative, and a rater assigns positive or negative ratings, then the resulting two by two table has True Positives and True Negatives on the diagonal and False Positives and False Negatives off the diagonal.

When expressed as percentages of the total, then Base Rates (BR) depend upon the state of the world, but Selection Ratios (SR) are under the control of the person making the decision and affect the number of False Positives and the number of Valid Positives.

Given a two x two table of counts or percentages

		Decide +	Decide -
True +	VP	FN	BR
True -	FP	VN	1- BR

Unfortunately, although this way of categorizing the data is typical in assessment (e.g., Wiggins 1973), and everything is expressed as percentages of the total, in some decision papers, VP are expressed as the ratio of VP to total positive decisions (e.g., Wickens, 1984). This requires dividing through by the column totals (and represented as VP* and FP* in the table below).

The relationships implied by these data can be summarized as a phi or tetrachoric correlation between the raters and the world, or as a decision process with several alternative measures:

Sensitivity, Specificity, Accuracy, Area Under the Curve, and d' (d prime). These measures may be defined as

	Measure	Definition
Sensitivity	VP/(VP+ FN)		Specificity
VN/(FP + VN)		Accuracy	VP + VN
	VP*	VP/(VP + FP)
FP*	(FP/(VP + FP		d'
z(VP) - z(FP)		d'	sqrt(2) z(AUC)
	beta	prob(X/S)/(prob(X/N)

Although only one point is found, we can form a graphical display of VP versus FP as a smooth curve as a function of the decision criterion. The smooth curve assumes normality whereas the other merely are the two line segments between the points (0,0), (FP,VP), (1,1). The resulting correlation between the inferred continuous state of the world and the dichotomous decision process is a biserial correlation.

When using table input, the values can be counts and thus greater than 1 or merely probabilities which should add up to 1. Base Rates and Selection Ratios are proportions and thus less than 1.

References

Metz, C.E. (1978) Basic principles of ROC analysis. Seminars in Nuclear Medicine, 8, 283-298.

Wiggins, Jerry S. (1973) Personality and Prediction: Principles of Personality Assessment. Addison-Wesley.

Wickens, Christopher D. (1984) Engineering Psychology and Human Performance. Merrill.

Examples

Run this code

# NOT RUN {
AUC(c(30,20,20,30))  #specify the table input
AUC(c(140,60,100,900)) #Metz example with colors
AUC(c(140,60,100,900),col=c("grey","black"))  #Metz example 1 no colors
AUC(c(80,120,40, 960)) #Metz example 2  Note how the accuracies are the same but d's differ
AUC(c(49,40,79,336)) #Wiggins p 249
AUC(BR=.05,SR=.254,Phi = .317) #Wiggins 251 extreme Base Rates





# }

Run the code above in your browser using DataLab