kendallTrendTest(y, ...)
## S3 method for class 'formula':
kendallTrendTest(y, data = NULL, subset,
na.action = na.pass, ...)
## S3 method for class 'default':
kendallTrendTest(y, x = seq(along = y),
alternative = "two.sided", correct = TRUE, ci.slope = TRUE,
conf.level = 0.95, warn = TRUE, data.name = NULL, data.name.x = NULL,
parent.of.data = NULL, subset.expression = NULL, ...)
y
must be numeric vector of observations.
In the formula method, y
must be a formula of the form y ~ 1
or
y ~ x
as.data.frame
to a data frame) containing the variables in the model.
If not found in data
, the variables are taken from environment(for
NA
s.
The default is na.pass
.x
must equal the length of y
.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
) values are
allowed but will b"two.sided"
(tau not equal to 0; the default),
"less"
(tau less than 0), and "greater"
(tau greater than 0).TRUE
.TRUE
.0.95
.y
does not contain at least two non-missing values,
or when x
does not contain at least two unique non-missing values.
The default value is TRUE
deparse(substitute(y))
.x
is not supplied this argument is ignored. When x
is supplied,
the default value is deparse(substitute(x))
."htest"
containing the results of the hypothesis
test. See the help file for htest.object
for details.
In addition, the following components are part of the list returned by
kendallTrendTest
:kendallSeasonalTrendTest
.kendallTrendTest
performs Kendall's nonparametric test for a monotonic trend,
which is a special case of the test for independence based on Kendall's tau statistic
(see cor.test
). The slope is estimated using the method of Theil (1950) and
Sen (1968). When ci.slope=TRUE
, the confidence interval for the slope is
computed using Gilbert's (1987) Modification of the Theil/Sen Method.
Kendall's test for a monotonic trend is a special case of the test for independence
based on Kendall's tau statistic. The first section below explains the general case
of testing for independence. The second section explains the special case of
testing for monotonic trend. The last section explains how a simple linear
regression model is a special case of a monotonic trend and how the slope may be
estimated.
The General Case of Testing for Independence
Definition of Kendall's Tau
Let $X$ and $Y$ denote two continuous random variables with some joint
(bivariate) distribution. Let $(X_1, Y_1), (X_2, Y_2), \ldots, (X_n, Y_n)$
denote a set of $n$ bivariate observations from this distribution, and assume
these bivariate observations are mutually independent. Kendall (1938, 1975) proposed
a test for the hypothesis that the $X$ and $Y$ random variables are
independent based on estimating the following quantity:
$$\tau = { 2 Pr[(X_2 - X_1)(Y_2 - Y_1) > 0] } - 1 \;\;\;\;\;\; (1)$$
The quantity in Equation (1) is called Kendall's tau, although this term is more
often applied to the estimate of $\tau$ (see Equation (2) below).
If $X$ and $Y$ are independent, then $\tau=0$. Furthermore, for most
distributions of interest, if $\tau=0$ then the random variables $X$ and
$Y$ are independent. (It can be shown that there exist some distributions for
which $\tau=0$ and the random variables $X$ and $Y$ are not independent;
see Hollander and Wolfe (1999, p.364)).
Note that Kendall's tau is similar to a correlation coefficient in that
$-1 \le \tau \le 1$. If $X$ and $Y$ always vary in the same direction,
that is if $X_1 < X_2$ always implies $Y_1 < Y_2$, then $\tau = 1$.
If $X$ and $Y$ always vary in the opposite direction, that is if
$X_1 < X_2$ always implies $Y_1 > Y_2$, then $\tau = -1$. If
$\tau > 0$, this indicates $X$ and $Y$ are positively associated.
If $\tau < 0$, this indicates $X$ and $Y$ are negatively associated.
Estimating Kendall's Tau
The quantity in Equation (1) can be estimated by:
$$\hat{\tau} = \frac{2S}{n(n-1)} \;\;\;\;\;\; (2)$$
where
$$S = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} sign[(X_j - X_i)(Y_j - Y_i)] \;\;\;\;\;\; (3)$$
and $sign()$ denotes the sign
function:
kendallTrendTest
uses the large sample approximation to the
distribution of $S$ under the null hypothesis, which is given by:
$$z = \frac{S - E(S)}{\sqrt{Var(S)}} \;\;\;\;\;\; (5)$$
where
$$E(S) = 0 \;\;\;\;\;\; (6)$$
$$Var(S) = \frac{n(n-1)(2n+5)}{18} \;\;\;\;\;\; (7)$$
Under the null hypothesis, the quantity $z$ defined in Equation (5) is
approximately distributed as a standard normal random variable.
Both Kendall (1975) and Mann (1945) show that the normal approximation is excellent
even for samples as small as $n=10$, provided that the following continuity
correction is used:
$$z = \frac{S - sign(S)}{\sqrt{Var(S)}} \;\;\;\;\;\; (8)$$
The function kendallTrendTest
performs the usual one-sample z-test using
the statistic computed in Equation (8) or Equation (5). The argument
correct
determines which equation is used to compute the z-statistic.
By default, correct=TRUE
so Equation (8) is used.
In the case of tied observations in either the observed $X$'s and/or observed
$Y$'s, the formula for the variance of $S$ given in Equation (7) must be
modified as follows:
x
for the function
kendallTrendTest
.
In the case where the $X$'s represent time and are all distinct, the test for
independence between $X$ and $Y$ is equivalent to testing for a monotonic
trend in $Y$, and the test statistic $S$ simplifies to:
$$S = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} sign(Y_j - Y_i) \;\;\;\;\;\; (10)$$
Also, the formula for the variance of $S$ in the presence of ties (under the
null hypothesis $H_0: \tau = 0$) simplifies to:
$$Var(S) = \frac{n(n-1)(2n+5)}{18} - \frac{\sum_{j=1}^{h} u_j(u_j-1)(2u_j+5)}{18} \;\;\;\;\;\; (11)$$
A form of the test statistic $S$ in Equation (10) was introduced by Mann (1945).
The Special Case of a Simple Linear Model: Estimating the Slope
Consider the simple linear regression model
$$Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \;\;\;\;\;\; (12)$$
where $\beta_0$ denotes the intercept, $\beta_1$ denotes the slope,
$i = 1, 2, \ldots, n$, and the $\epsilon$'s are assumed to be
independent and identically distributed random variables from the same distribution.
This is a special case of dependence between the $X$'s and the $Y$'s, and
the null hypothesis of a zero slope can be tested using Kendall's test statistic
$S$ (Equation (3) or (10) above) and the associated z-statistic
(Equation (5) or (8) above) (Hollander and Wolfe, 1999, pp.415--420).
Theil (1950) proposed the following nonparametric estimator of the slope:
$$\hat{\beta}_1 = Median(\frac{Y_j - Y_i}{X_j - X_i}); \;\; i < j \;\;\;\;\;\; (13)$$
Note that the computation of the estimated slope involves computing
$$N = {n \choose 2} = \frac{n(n-1)}{2} \;\;\;\;\;\; (14)$$
kendallTrendTest
always returns estimates of
slope and intercept assuming a linear model (Equation (12)), while the p-value
is based on Kendall's tau, which is testing for the broader alternative of any
kind of dependence between the $X$'s and $Y$'s.
Confidence Interval for the Slope
Theil (1950) and Sen (1968) proposed methods to compute a confidence interval for
the true slope, assuming the linear model of Equation (12) (see
Hollander and Wolfe, 1999, pp.421-422). Gilbert (1987, p.218) illustrates a
simpler method than the one given by Sen (1968) that is based on a normal
approximation. Gilbert's (1987) method is an extension of the one given in
Hollander and Wolfe (1999, p.424) that allows for ties and/or multiple
observations per time period. This method is valid for a sample size as small as
$n=10$ unless there are several tied observations.
Let $N'$ denote the number of defined two-point estimated slopes that are used
in Equation (15) above (if there are no tied $X$ values then $N' = N$), and
let $\hat{\beta}_{1_{(1)}}, \hat{\beta}_{1_{(2)}}, \ldots, \hat{\beta}_{1_{(N')}}$
denote the $N'$ ordered slopes. For Gilbert's (1987) method, a
$100(1-\alpha)%$ two-sided confidence interval for the true slope is given by:
$$[\hat{\beta}_{1_{(M1)}}, \hat{\beta}_{1_{(M2+1)}}] \;\;\;\;\;\; (17)$$
where
$$M1 = \frac{N' - C_{\alpha}}{2} \;\;\;\;\;\; (18)$$
$$M2 = \frac{N' + C_{\alpha}}{2} \;\;\;\;\;\; (19)$$
$$C_\alpha = z_{1 - \alpha/2} \sqrt{Var(S)} \;\;\;\;\;\; (20)$$
$Var(S)$ is defined in Equations (7), (9), or (11), and
$z_p$ denotes the $p$'th quantile of the standard normal distribution.
One-sided confidence intervals may computed in a similar fashion.
Usually the quantities $M1$ and $M2$ will not be integers.
Gilbert (1987, p.219) suggests interpolating between adjacent values in this case,
which is what the function kendallTrendTest
does.cor.test
, kendallSeasonalTrendTest
, htest.object
.# Reproduce Example 17-6 on page 17-33 of USEPA (2009). This example
# tests for trend in sulfate concentrations (ppm) collected at various
# months between 1989 and 1996.
head(EPA.09.Ex.17.6.sulfate.df)
# Sample.No Year Month Sampling.Date Date Sulfate.ppm
#1 1 89 6 89.6 1989-06-01 480
#2 2 89 8 89.8 1989-08-01 450
#3 3 90 1 90.1 1990-01-01 490
#4 4 90 3 90.3 1990-03-01 520
#5 5 90 6 90.6 1990-06-01 485
#6 6 90 8 90.8 1990-08-01 510
# Plot the data
#--------------
dev.new()
with(EPA.09.Ex.17.6.sulfate.df,
plot(Sampling.Date, Sulfate.ppm, pch = 15, ylim = c(400, 900),
xlab = "Sampling Date", ylab = "Sulfate Conc (ppm)",
main = "Figure 17-6. Time Series Plot of
Sulfate Concentrations (ppm)")
)
Sulfate.fit <- lm(Sulfate.ppm ~ Sampling.Date,
data = EPA.09.Ex.17.6.sulfate.df)
abline(Sulfate.fit, lty = 2)
# Perform the Kendall test for trend
#-----------------------------------
kendallTrendTest(Sulfate.ppm ~ Sampling.Date,
data = EPA.09.Ex.17.6.sulfate.df)
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: tau = 0
#
#Alternative Hypothesis: True tau is not equal to 0
#
#Test Name: Kendall's Test for Trend
# (with continuity correction)
#
#Estimated Parameter(s): tau = 0.7667984
# slope = 26.6666667
# intercept = -1909.3333333
#
#Estimation Method: slope: Theil/Sen Estimator
# intercept: Conover's Estimator
#
#Data: y = Sulfate.ppm
# x = Sampling.Date
#
#Data Source: EPA.09.Ex.17.6.sulfate.df
#
#Sample Size: 23
#
#Test Statistic: z = 5.107322
#
#P-value: 3.267574e-07
#
#Confidence Interval for: slope
#
#Confidence Interval Method: Gilbert's Modification
# of Theil/Sen Method
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 20.00000
# UCL = 35.71182
# Clean up
#---------
rm(Sulfate.fit)
graphics.off()
Run the code above in your browser using DataLab