ranktes: Tests of independence between two rankings

Description

Performs various independence tests based on rank correlations.

Usage

ranktes(r, n, index = "", approx = "exact", CC = FALSE, type = "two-sided",
	 print = TRUE)

Arguments

the value of the test statistic.

the number of ranks.

index

a character string that specifies the rank correlation that is used in the test statistic. Acceptable values are: "spearman","kendall","gini","r4", "fy1" (FY-means), "fy2" (FY-medians) and "sbz" (Symmetrical Borroni-Zenga). Only enough of the string to be unique is required.

approx

a character string that specifies the type of approximation to the null distribution of the statistic required in index: "vggfr", "exact","gaussian","student". Only enough of the string to be unique is required.

if true, a continuity correction is applied. Ignored if approx= "exact" or if index="r4" or if index="fy1" or if index="fy2" or if index="sbz".

type

type of alternative hypothesis. The options are "two-sided" (independence), "greater" (concordance) or "less" (discordance). Only enough of the string to be unique is required.

FALSE suppresses some of the output.

Value

A list containing the following components:

number of ranks

statistic

type of rank correlation

observed value of the coefficient

approx

type of approximation

tails

type of alternative hypothesis

Cpv

conservative $p$-value

Lpv

liberal $p$-value

Lambda

if approx="vggfr" returns the vector containing the two shape parameters of the VGGFR density that best fits the null distribution of the required coefficients. It is NULL otherwise

Details

Upon computing $r_h$, it is common practice to determine whether the value is large enough, in absolute terms, to lead to the conclusion that the rank correlation coefficient that would be obtained from the entire set of permutations, say it $\rho_h$ is different from zero. To this end, we consider the test $H_0:\rho_h=0$ against:

$H_1:\rho_h>0$ (concordance). Only a large positive $r_h$ can be considered in line with this alternative hypothesis. To be considered significant, $r_h$ must be greater and its value must be equal to or larger than the critical values corresponding to the prespecified $\alpha$ level of significance.

$H_1:\rho_h<0$ (discordance). Only a large negative $r_h$ value will provide support for this alternative hypothesis. More specifically, $r_h$ must be less and $|r_h|$ must be equal to or larger than the critical values corresponding to the prespecified $\alpha$ level of significance.

$H_1:\rho_h\ne 0$ (independence). Either a large negative or a large positive value of $r_h$ are conform to this alternative hypothesis. A significant value of $r_h$ is obtained if the prespecified $\alpha$ level of significance is equal to or greater than 2$Prob(-|r_h|)$ where $Prob(.)$ is the probability density used to approximate the null distribution of $r_h$.

It is important to note that, as correctly observe Iman and Conover (1978), the discreteness of rank correlations often leads into situations where no critical region has exactly the size $\alpha$. If approx="exact" the routine provides the next smaller exact size called conservative $p$-value or the next larger exact size called liberal $p$-value. A test can be considered conclusive if both the conservative and the liberal significance levels lie on the same side with respect of $\alpha$. Compared to an exact $p$-value, a conservative $p$-value tends to understate the evidence against $H_0$, whereas a liberal $p$-value tends to overstate it.

If approx="exact" then ranktes uses precomputed exact null distributions. In particular, $n\le 26$ for $r_1$ (see Gustafson, 2009); $n\le 24$ for $r_3$ (see Girone et al. 2010). We have obtained the null distribution of $r_4$, $r_5$, $r_6$ and $r_7$ for $n\le 15$ by calculating rank correlations for all the $n!$ permutations of the integers $1,2, \cdots, n$. Kendall's $r_2$ benefits from a recurrence relationship (Panneton and Robillard, 1972) that can handle $r_2$ for $n$ up $60$. In this regard, it is very useful the package Mpfr.

In the case approx $\ne$ "exact", this routine computes critical values and significance levels by applying the density function indicated in approx. Now, of course, conservative and liberal p-values coincide.

The statistics involved in t-Student approximations are:

$$r_h^+=r_h\sqrt{\frac{m_{h,a}}{1-r_h^2}} \sim t_{m_{h,b}}, h=1,\cdots,4$$

where $m_{1,a}=n-2$, $m_{2,a}=[9n(n-1)/(4n+10)-1]$, $m_{3,a}=[3(n-1)(n^2-k_n)/2(n^2+2+ k_n)]$ with $k_n=n\ mod \ 2$ and $m_{4,a}=2(n-2.01524)/1.00762$. Furthermore, $m_{1,b}=n-2$, $m_{2,b}=\lfloor m_{2,a} \rfloor$, $m_{3,b}=\lfloor m_{3,b}+0.5 \rfloor$, $m_{4,b}=\lfloor m_{4,a} \rfloor$.

The t-Student approximations of $r_1$ and the Gaussian approximation of Kendall's $r_2$ are well known. Vittadini (1996) proposes the t-Student approximation to Kendall's $r_2$. Landenna et. al. (1989) suggest the t-Student for the Gini cograduation coefficient $r_3$. Tarsitano and Amerise (2015) developed a t-Student approximation to the null distribution of $r_4$. Terry (1952) has pointed out that the t-distribution with $n-2$ degrees of freedom provides a good approximation to the null distribution of $r_5^+$. The t-student approximation of $r_6$ and $r_7$ have not yet been developed.

The statistics involved in the Gaussian distribution are:

$$r_1^*=\frac{r_1}{\sqrt{n-1}}, \ r_2^*=r_2\sqrt{\frac{4n+10}{9n(n-1)}}, \ r_3^*=\frac{r_3}{\sqrt{1.5n}}, \ r_4^*=\frac{r_4}{\sqrt{1.00762(n-1)}}$$

$$r_5^*=\frac{r_5}{\sqrt{n-1}}, \ r_6^*=\frac{r_6}{\sqrt{n-1}}, \ r_7^*=\frac{r_7}{\sqrt{1.806452n}}$$

Cifarelli and Regazzini (1977) give the Gaussian approximation to the null distribution of Gini coefficient. See also Genest et al. (2010). The Gaussian approximation to the null distribution of $r_4$ is obtained in Tarsitano and Amerise (2015). Borroni (2013) derived the Gaussian approximation for $r_7$. The Gaussian approximations to $r_5$, $r_6$ and $r_7$ are asymptotic results.

If approx="vggfr" the Vianelli generalized Gaussian distribution with finite range is fitted by using the method of moments.

References

Borroni, G. C. (2013). "A new rank correlation measure". Statistical Papers, 54, 255--270.

Cifarelli, D. M. and Regazzini, E. (1977). "On a distribution-free test of independence based on Gini's rank association coefficient". Recent Developments in Statistics (Proceedings of the European Meeting of Statisticians, Grenoble, 1976), Amsterdam, North-Holland, 375--385.

Genest, N. B. and Neslehova, J. and Ben Ghorbal, N. (2010). "'Spearman's footrule and Gini's gamma: a review with complements" Journal of Nonparametric Statistics, 22, 937--954.

Girone, G. et al. (2010). "La distribuzione campionaria dell'indice di cograduazione di Gini per dimensioni campionarie fino a 24". Annali del Dipartimento di Scienze Statistiche "Carlo Cecchi" - Universita` di Bari, 24, 246--271.

Gustafson, L. (2009). rho null distribution. Available at http://www.luke-g.com/

Iman, L. and Conover, W. J. (1978). "Approximations of the critical region for 's rho with and without ties presen"t. Communication in Statistics - Simulation and Computation, 7, 269--282.

Landenna, G. and Scagni, A. and Boldrini, M. (1989). "An approximated distribution of the Gini's rank association coefficient". Communications in Statistics. Theory and Methods, 18, 2017--2026.

Tarsitano, A. and Amerise, I. L. (2015). "On a measure of rank-order association". Journal of Statistical and Econometric Methods, 4, 83--105.

Tarsitano, A. and Amerise, I. L. (2016). "Modelling the null distribution of rank correlations". Submitted.

Terry, M. E. (1952). "Some rank order tests which are most powerful against specific parameteric alternatives". Annals of Mathematical Statistics, 23, 346--366.

Vittadini, G. (1996). "Una famiglia di distribuzione per i test di associazione". In Atti della XXXVIII riunione scientifica della S.I.S., Rimini 9-13 Aprile 1996, 2, 521--528.

Examples

Run this code

# NOT RUN {
	
# G. P. Watkins (1933). An Ordinal Index of Correlation, Journal of the 
# American Statistical Association, 28:182, 139-151.
#  20-item series for area and density have been made up to cover the
# original 13 states and the four others earliest admitted to the Union.
State<-c("Georgia","North_Carolina","New_York","Luisiana","Pennsylvania",
"Virginia","Tennessee","Ohio","Kentucky","Maine","South_Carolina",
"West_Virginia","Maryland","Vermont","New_Hampshire","Massachusetts",
"New_Jersey","Connecticut","Delaware","Rhode_Island")
Area<-c(1,2,	3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
Density<-c(17,13,5,18,6,14,12,7,11,20,15,10,8,19,16,2,3,4,9,1)
op<-par(mfrow=c(1,1))
plot(Area,Density,main="",xlab="Area",ylab="Density",pch=19,cex=0.9,
col="darkgreen" )
abline(h=mean(Area),col="black",lty=2,lwd=1)
abline(v=mean(Density),col="darkblue",lty=2,lwd=1)
par(op)
r<-comprank(Area,Density,"fy2","wgh")$r
ranktes(r, length(Area), "fy2", "ga",FALSE, "two", TRUE)
#####
#
# }
# NOT RUN {
data(Atar);attach(Atar)
op<-par(mfrow=c(1,1))
plot(TBL,TFL,main="",xlab="Backward Linkage Index",ylab=
"Forward Linkage Index",pch=19, cex=0.9,col="magenta")
abline(h=mean(TFL),col="black",lty=2,lwd=1)
abline(v=mean(TBL),col="black",lty=2,lwd=1)
par(op)
r<-comprank(TBL,TFL,"fy1","wgh")$r
ranktes(r, length(TBL), "fy1", "vggfr",FALSE, "two", TRUE)
detach(Atar)
# }
# NOT RUN {
#####
data(Sharpe);attach(Sharpe)
op<-par(mfrow=c(1,1))
plot(AVR,VAR, type = "p",pch=19,cex=1.1,col="tomato",main="Mutual fund 
performance")
text(AVR,VAR, labels = rownames(Sharpe), cex=0.5, pos=3)
abline(h=mean(AVR),col="black",lty=2,lwd=1)
abline(v=mean(VAR),col="black",lty=2,lwd=1)
par(op)
r<-comprank(AVR,VAR,"sbz","wgh")$r
ranktes(r, length(AVR), "sbz", "st",FALSE, "greater", TRUE)
detach(Sharpe)
#####
#
# }
# NOT RUN {
# Sun,J.-G. and Jurisicova, A. and Casper, R.F. (1997). "Detection of
# Deoxyribonucleic Acid Fragmentation in Human Sperm: Correlation
# with Fertilization In Vitro".  Biology of Reproduction, 56, 602-607.
n<-c(222,298,143,143,291,148)
r<-c(-0.18,-0.12,-0.16,-0.20,-0.06,-0.003)
App<-c("Ga","St","Vg")
N<-length(n);Ta<-matrix(NA,N,5)
for (i in 1:length(n)){Ta[i,1]<-r[i];Ta[i,2]<-n[i]
		for (j in 1:3){
				app<-App[j]
				a<-ranktes(r[i],n[i],"S",app,FALSE,"t",FALSE);Ta[i,2+j]<-a$Cpv
		}}
Df<-matrix(Ta,6,5)
rownames(Df)<-c("Conc. sperm/mL","Motility", "Fertilization rate",
"Cleavage rate", "Male age","Abstinence days")
colnames(Df)<-c("Spearman","n of samples","Appr. Gaussian",
"Appr. t-Student", "Appr. GGFR")
Df<-as.data.frame(Df)
print(round(Df,5))
# }
# NOT RUN {
#####
#
# }
# NOT RUN {
data(Starshi);attach(Starshi)
op<-par(mfrow=c(1,1))
plot(Sm15F,Sm15M, type = "p",pch=19,cex=0.9,col="darkorange",
main="Smokers ")
text(Sm15F,Sm15M,labels = rownames(Starshi),cex=0.6,pos=
c(1,rep(2,10),3,2))
abline(h=mean(Sm15M),col="black",lty=2,lwd=1)
abline(v=mean(Sm15F),col="black",lty=2,lwd=1)
par(op)
r<-comprank(Sm15F,Sm15M,"r4","wgh")$r
a<-ranktes(r, length(Sm15F), "r4", "ex",TRUE, "two", FALSE)
cat(a$Value,a$Cpv,a$Lpv,"\n")
r<-comprank(Sm15F,Sm15M,"sp","wgh")$r
a<-ranktes(r, length(Sm15F), "sp", "ex",TRUE, "two", FALSE)
cat(a$Value,a$Cpv,a$Lpv,"\n")
r<-comprank(Sm15F,Sm15M,"ke","wgh")$r
a<-ranktes(r, length(Sm15F), "ke", "ex",TRUE, "two", FALSE)
cat(a$Value,a$Cpv,a$Lpv,"\n")
r<-comprank(Sm15F,Sm15M,"gi","wgh")$r
a<-ranktes(r, length(Sm15F), "gi", "ex",TRUE, "two", FALSE)
cat(a$Value,a$Cpv,a$Lpv,"\n")
r<-comprank(Sm15F,Sm15M,"fy1","wgh")$r
a<-ranktes(r, length(Sm15F), "fy1", "ex",TRUE, "two", FALSE)
cat(a$Value,a$Cpv,a$Lpv,"\n")
r<-comprank(Sm15F,Sm15M,"fy2","wgh")$r
a<-ranktes(r, length(Sm15F), "fy2", "ex",TRUE, "two", FALSE)
cat(a$Value,a$Cpv,a$Lpv,"\n")
r<-comprank(Sm15F,Sm15M,"sbz","wgh")$r
a<-ranktes(r, length(Sm15F), "sbz", "ex",TRUE, "two", FALSE)
cat(a$Value,a$Cpv,a$Lpv,"\n")
detach(Starshi)
# }
# NOT RUN {
#####
#
# }
# NOT RUN {
All.App<-function(r,n,index,type){
# Computes p-values of an observed rank correlation statistic
	A<-rep(r,9)
	names(A)<-encodeString(c(index,"t-Student, CC=F","Gaussian, CC=F", "VGGFR,
	CC=F","t-Student, CC=T","Gaussian, CC=T", "VGGFR, CC=T",
	"Exact p Conservative","Exact p Liberal"),justify="right")
		a<-ranktes(r,n,index,"St",FALSE,type,FALSE);A[2]<-a$Cpv
		a<-ranktes(r,n,index,"Ga",FALSE,type,FALSE);A[3]<-a$Cpv
		a<-ranktes(r,n,index,"Vg",FALSE,type,FALSE);A[4]<-a$Cpv
		a<-ranktes(r,n,index,"St",TRUE,type,FALSE);A[5]<-a$Cpv
		a<-ranktes(r,n,index,"Ga",TRUE,type,FALSE);A[6]<-a$Cpv
		a<-ranktes(r,n,index,"Vg",TRUE,type,FALSE);A[7]<-a$Cpv
		a<-ranktes(r,n,index,"Ex",FALSE,type,FALSE);A[8]<-a$Cpv;A[9]<-a$Lpv
		A<-as.matrix(A)
return(A)}
data(Gabbs);attach(Gabbs)
B<-matrix(0,9,6)
colnames(B)<-colnames(Gabbs[1:6])
rownames(B)<-encodeString(c("index","t-Student, CC=F","Gaussian, CC=F",
"VGGFR, CC=F","t-Student, CC=T", "Gaussian, CC=T", "VGGFR, CC=T",
"Exact p Conservative","Exact p Liberal"), justify="right")
index<-"spearman"
for (i in 1:6){r<-comprank(Gabbs[,i],Gabbs[,7],index, print=FALSE)$r
					B[,i]<-All.App(r,19,index,"less")}
print(round(B,5))
detach(Gabbs)
# }
# NOT RUN {
#####
#
# }
# NOT RUN {
data(Dalyww);attach(Dalyww)
op<-par(mfrow=c(1,1))
plot(ACLS,ASHR,main="The paradox of high rates of suicide in happy places",
xlab="Adjusted Life Satisfaction", ylab="Adjusted Suicide Risk",pch=19,
cex=0.8,col="steelblue")
text(ACLS,ASHR,labels=rownames(Dalyww),cex=0.7,pos=2)
abline(h=mean(ASHR),col="black",lty=2,lwd=1)
abline(v=mean(ACLS),col="black",lty=2,lwd=1)
par(op)
r<-comprank(ACLS,ASHR,"spearman")$r;n<-length(ASHR)
out<-ranktes(r,n,"s","ga",FALSE,"greater",FALSE)
cat(round(out$Value,3),round(out$Cpv,5),round(out$Lpv,5),"\n")
out<-ranktes(r,n,"s","st",FALSE,"greater",FALSE)
cat(round(out$Value,3),round(out$Cpv,5),round(out$Lpv,5),"\n")
out<-ranktes(r,n,"s","vg",FALSE,"greater",FALSE)
cat(round(out$Value,3),round(out$Cpv,5),round(out$Lpv,5),"\n")
#
r<-comprank(ACLS,ASHR,"kendall")$r
out<-ranktes(r,n,"kendall","st",FALSE,"greater",FALSE)
cat(round(out$Value,3),round(out$Cpv,5),round(out$Lpv,5),"\n")
#
r<-comprank(ACLS,ASHR,"r4")$r
out<-ranktes(r,n,"r4","st",FALSE,"greater",FALSE)
cat(round(out$Value,3),round(out$Cpv,5),round(out$Lpv,5),"\n")
detach(Dalyww)
# }

Run the code above in your browser using DataLab