linearTrendTestPower(n, x = lapply(n, seq), slope.over.sigma = 0, alpha = 0.05,
alternative = "two.sided", approx = FALSE)
n
must be positive integers
larger than 2. This argument is ignored when x
is supplied.
Missing (NA
), undefined (NaN
), and infinite (Inf
x=lappl
slope.over.sigma=0
.alpha=0.05
."two.sided"
(the default), "greater"
, and "less"
.FALSE
.x
is a vector, it is converted into a list with one
component. If the arguments n
, x
, slope.over.sigma
, and
alpha
are not all the same length, they are replicated to be the same
length as the length of the longest argument.
Basic Model
Consider the simple linear regression model
$$Y = \beta_0 + \beta_1 X + \epsilon \;\;\;\;\;\; (1)$$
where $X$ denotes the predictor variable (observed without error),
$\beta_0$ denotes the intercept, $\beta_1$ denotes the slope, and the
error term $\epsilon$ is assumed to be a random variable from a normal
distribution with mean 0 and standard deviation $\sigma$. Let
$$(\underline{x}, \underline{y}) = (x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n) \;\;\;\;\;\; (2)$$
denote $n$ independent observed $(X,Y)$ pairs from the model (1).
Often in environmental data analysis, we are interested in determining whether there
is a trend in some indicator variable over time. In this case, the predictor
variable $X$ is time (e.g., day, month, quarter, year, etc.), and the $n$
values of the response variable $Y$ represent measurements taken over time.
The slope then represents the change in the average of the response variable per
one unit of time.
When the argument x
is a numeric vector, it represents the
$n$ values of the predictor variable. When the argument x
is a
list, each component of x
is a numeric vector that represents a set values
of the predictor variable (and the number of elements may vary by component).
By default, the argument x
is a list for which the i'th component is simply
the integers from 1 to the value of the i'th element of the argument n
,
representing, for example, Day 1, Day2, ..., Day n[i]
.
In the discussion that follows, be sure not to confuse the intercept and slope
coefficients $\beta_0$ and $\beta_1$ with the Type II error of the
hypothesis test, which is denoted by $\beta$.
Estimation of Coefficients and Confidence Interval for Slope
The standard least-squares estimators of the slope and intercept are given by:
$$\hat{\beta}_1 = \frac{S_{xy}}{S_{xx}} \;\;\;\;\;\; (3)$$
$$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \;\;\;\;\;\; (4)$$
where
$$S_{xy} = \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) \;\;\;\;\;\; (5)$$
$$S_{xx} = \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\;\;\; (6)$$
$$\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\;\;\; (7)$$
$$\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i \;\;\;\;\;\; (8)$$
(Draper and Smith, 1998, p.25; Zar, 2010, p.332-334; Berthoux and Brown, 2002, p.297;
Helsel and Hirsch, p.226). The estimator of slope in Equation (3) has a normal
distribution with mean equal to the true slope, and variance given by:
$$Var(\hat{\beta}_1) = \sigma_{\hat{\beta}_1}^2 = \frac{\sigma^2}{S_{xx}} \;\;\;\;\;\; (9)$$
(Draper and Smith, 1998, p.35; Zar, 2010, p.341; Berthoux and Brown, 2002, p.299;
Helsel and Hirsch, 1992, p.227). Thus, a $(1-\alpha)100%$ two-sided confidence
interval for the slope is given by:
$$[ \hat{\beta}_1 - t_{n-2}(1-\alpha/2) \hat{\sigma}_{\hat{\beta}_1}, \;\; \hat{\beta}_1 + t_{n-2}(1-\alpha/2) \hat{\sigma}_{\hat{\beta}_1} ] \;\;\;\;\;\; (10)$$
where
$$\hat{\sigma}_{\hat{\beta}_1} = \frac{\hat{\sigma}}{\sqrt{S_{xx}}} \;\;\;\;\;\; (11)$$
$$\hat{\sigma}^2 = s^2 = \frac{1}{n-2} \sum_{i=1}^n (y_i - \hat{y}_i)^2 \;\;\;\;\;\; (12)$$
$$\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i \;\;\;\;\;\; (13)$$
and $t_{\nu}(p)$ denotes the $p$'th quantile of
Student's t-distribution with $\nu$ degrees of freedom
(Draper and Smith, 1998, p.36; Zar, 2010, p.343; Berthoux and Brown, 2002, p.300;
Helsel and Hirsch, 1992, p.240).
Testing for a Non-Zero Slope
Consider the null hypothesis of a zero slope coefficient:
$$H_0: \beta_1 = 0 \;\;\;\;\;\; (14)$$
The three possible alternative hypotheses are the upper one-sided alternative
(alternative="greater"
):
$$H_a: \beta_1 > 0 \;\;\;\;\;\; (15)$$
the lower one-sided alternative (alternative="less"
)
$$H_a: \beta_1 < 0 \;\;\;\;\;\; (16)$$
and the two-sided alternative (alternative="two.sided"
)
$$H_a: \beta_1 \ne 0 \;\;\;\;\;\; (17)$$
The test of the null hypothesis (14) versus any of the three alternatives (15)-(17) is
based on the Student t-statistic:
$$t = \frac{\hat{\beta}_1}{\hat{\sigma}_{\hat{\beta}_1}} = \frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \;\;\;\;\;\; (18)$$
Under the null hypothesis (14), the t-statistic in (18) follows a
Student's t-distribution with $n-2$ degrees of freedom
(Draper and Smith, 1998, p.36; Zar, 2010, p.341;
Helsel and Hirsch, 1992, pp.238-239).
The formula for the power of the test of a zero slope depends on which alternative
is being tested.
The two subsections below describe exact and approximate formulas for the power of
the test. Note that none of the equations for the power of the t-test
requires knowledge of the values $\beta_1$ or $\sigma$
(the population standard deviation of the error terms), only the ratio
$\beta_1/\sigma$. The argument slope.over.sigma
is this ratio, and it is
referred to as the approx=FALSE
)
This subsection describes the exact formulas for the power of the t-test for a
zero slope.
Upper one-sided alternative (alternative="greater"
)
The standard Student's t-test rejects the null hypothesis (1) in favor of the
upper alternative hypothesis (2) at level-$\alpha$ if
$$t \ge t_{\nu}(1 - \alpha) \;\;\;\;\;\; (19)$$
where
$$\nu = n - 2 \;\;\;\;\;\; (20)$$
and, as noted previously, $t_{\nu}(p)$ denotes the $p$'th quantile of
Student's t-distribution with $\nu$ degrees of freedom.
The power of this test, denoted by $1-\beta$, where $\beta$ denotes the
probability of a Type II error, is given by:
$$1 - \beta = Pr[t_{\nu, \Delta} \ge t_{\nu}(1 - \alpha)] = 1 - G[t_{\nu}(1 - \alpha), \nu, \Delta] \;\;\;\;\;\; (21)$$
where
$$\Delta = \sqrt{S_{xx}} \frac{\beta_1}{\sigma} \;\;\;\;\;\; (22)$$
and $t_{\nu, \Delta}$ denotes a
non-central Student's t-random variable with
$\nu$ degrees of freedom and non-centrality parameter $\Delta$, and
$G(x, \nu, \Delta)$ denotes the cumulative distribution function of this
random variable evaluated at $x$ (Johnson et al., 1995, pp.508-510).
Note that when the predictor variable $X$ represents equally-spaced measures
of time (e.g., days, months, quarters, etc.) and
$$x_i = i, \;\; i = 1, 2, \ldots, n \;\;\;\;\;\; (23)$$
then the non-centrality parameter in Equation (22) becomes:
$$\Delta = \sqrt{\frac{(n-1)n(n+1)}{12}} \frac{\beta_1}{\sigma} \;\;\;\;\;\; (24)$$
Lower one-sided alternative (alternative="less"
)
The standard Student's t-test rejects the null hypothesis (1) in favor of the
lower alternative hypothesis (3) at level-$\alpha$ if
$$t \le t_{\nu}(\alpha) \;\;\;\;\;\; (25)$$
and the power of this test is given by:
$$1 - \beta = Pr[t_{\nu, \Delta} \le t_{\nu}(\alpha)] = G[t_{\nu}(\alpha), \nu, \Delta] \;\;\;\;\;\; (26)$$
Two-sided alternative (alternative="two.sided"
)
The standard Student's t-test rejects the null hypothesis (14) in favor of the
two-sided alternative hypothesis (17) at level-$\alpha$ if
$$|t| \ge t_{\nu}(1 - \alpha/2) \;\;\;\;\;\; (27)$$
and the power of this test is given by:
$$1 - \beta = Pr[t_{\nu, \Delta} \le t_{\nu}(\alpha/2)] + Pr[t_{\nu, \Delta} \ge t_{\nu}(1 - \alpha/2)]$$
$$= G[t_{\nu}(\alpha/2), \nu, \Delta] + 1 - G[t_{\nu}(1 - \alpha/2), \nu, \Delta] \;\;\;\;\;\; (28)$$
The power of the t-test given in Equation (28) can also be expressed in terms of the
cumulative distribution function of the non-central F-distribution
as follows. Let $F_{\nu_1, \nu_2, \Delta}$ denote a
non-central F random variable with $\nu_1$ and
$\nu_2$ degrees of freedom and non-centrality parameter $\Delta$, and let
$H(x, \nu_1, \nu_2, \Delta)$ denote the cumulative distribution function of this
random variable evaluated at $x$. Also, let $F_{\nu_1, \nu_2}(p)$ denote
the $p$'th quantile of the central F-distribution with $\nu_1$ and
$\nu_2$ degrees of freedom. It can be shown that
$$(t_{\nu, \Delta})^2 \cong F_{1, \nu, \Delta^2} \;\;\;\;\;\; (29)$$
where $\cong$ denotes approx=TRUE
)
Zar (2010, pp.115--118) presents an approximation to the power for the t-test
given in Equations (21), (26), and (28) above. His approximation to the power
can be derived by using the approximation
$$\sqrt{S_{xx}} \frac{\beta_1}{s} \approx \sqrt{SS_{xx}} \frac{\beta_1}{\sigma} = \Delta \;\;\;\;\;\; (32)$$
where $\approx$ denotes alternative="greater"
)
The power for the upper one-sided alternative (15) given in Equation (21) can be
approximated as:
$$1 - \beta = Pr[t \ge t_{\nu}(1 - \alpha)]$$
$$= Pr[\frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \ge t_{\nu}(1 - \alpha) - \sqrt{S_{xx}}\frac{\beta_1}{s}]$$
$$\approx Pr[t_{\nu} \ge t_{\nu}(1 - \alpha) - \Delta]$$
$$= 1 - Pr[t_{\nu} \le t_{\nu}(1 - \alpha) - \Delta]$$
$$= 1 - G[t_{\nu}(1-\alpha) - \Delta, \nu] \;\;\;\;\;\; (34)$$
where $t_{\nu}$ denotes a central Student's t-random variable with $\nu$
degrees of freedom.
Lower one-sided alternative (alternative="less"
)
The power for the lower one-sided alternative (16) given in Equation (26) can be
approximated as:
$$1 - \beta = Pr[t \le t_{\nu}(\alpha)]$$
$$= Pr[\frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \le t_{\nu}(\alpha) - \sqrt{S_{xx}}\frac{\beta_1}{s}]$$
$$\approx Pr[t_{\nu} \le t_{\nu}(\alpha) - \Delta]$$
$$= G[t_{\nu}(\alpha) - \Delta, \nu] \;\;\;\;\;\; (35)$$
Two-sided alternative (alternative="two.sided"
)
The power for the two-sided alternative (17) given in Equation (28) can be
approximated as:
$$1 - \beta = Pr[t \le t_{\nu}(\alpha/2)] + Pr[t \ge t_{\nu}(1 - \alpha/2)]$$
$$= Pr[\frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \le t_{\nu}(\alpha/2) - \sqrt{SS_{xx}}\frac{\beta_1}{s}] + Pr[\frac{\hat{\beta}_1}{s/\sqrt{S_{xx}}} \ge t_{\nu}(1 - \alpha) - \sqrt{SS_{xx}}\frac{\beta_1}{s}]$$
$$\approx Pr[t_{\nu} \le t_{\nu}(\alpha/2) - \Delta] + Pr[t_{\nu} \ge t_{\nu}(1 - \alpha/2) - \Delta]$$
$$= G[t_{\nu}(\alpha/2) - \Delta, \nu] + 1 - G[t_{\nu}(1-\alpha/2) - \Delta, \nu] \;\;\;\;\;\; (36)$$linearTrendTestN
, linearTrendTestScaledMds
,
plotLinearTrendTestDesign
, lm
,
summary.lm
, kendallTrendTest
,
Power and Sample Size, Normal, t.test
.# Look at how the power of the t-test for zero slope increases with increasing
# sample size:
seq(5, 30, by = 5)
#[1] 5 10 15 20 25 30
power <- linearTrendTestPower(n = seq(5, 30, by = 5), slope.over.sigma = 0.1)
round(power, 2)
#[1] 0.06 0.13 0.34 0.68 0.93 1.00
#----------
# Repeat the last example, but compute the approximate power instead of the
# exact:
power <- linearTrendTestPower(n = seq(5, 30, by = 5), slope.over.sigma = 0.1,
approx = TRUE)
round(power, 2)
#[1] 0.05 0.11 0.32 0.68 0.93 0.99
#----------
# Look at how the power of the t-test for zero slope increases with increasing
# scaled slope:
seq(0.05, 0.2, by = 0.05)
#[1] 0.05 0.10 0.15 0.20
power <- linearTrendTestPower(15, slope.over.sigma = seq(0.05, 0.2, by = 0.05))
round(power, 2)
#[1] 0.12 0.34 0.64 0.87
#----------
# Look at how the power of the t-test for zero slope increases with increasing
# values of Type I error:
power <- linearTrendTestPower(20, slope.over.sigma = 0.1,
alpha = c(0.001, 0.01, 0.05, 0.1))
round(power, 2)
#[1] 0.14 0.41 0.68 0.80
#----------
# Show that for a simple regression model, you get a greater power of detecting
# a non-zero slope if you take all the observations at two endpoints, rather than
# spreading the observations evenly between two endpoints.
# (Note: This design usually cannot work with environmental monitoring data taken
# over time since usually observations taken close together in time are not
# independent.)
linearTrendTestPower(x = 1:10, slope.over.sigma = 0.1)
#[1] 0.1265976
linearTrendTestPower(x = c(rep(1, 5), rep(10, 5)), slope.over.sigma = 0.1)
#[1] 0.2413823
#==========
# Clean up
#---------
rm(power)
Run the code above in your browser using DataLab