Perform Rosner's generalized extreme Studentized deviate test for up to \(k\) potential outliers in a dataset, assuming the data without any outliers come from a normal (Gaussian) distribution.
rosnerTest(x, k = 3, alpha = 0.05, warn = TRUE)
numeric vector of observations.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
)
values are allowed but will be removed. There must be at least 10 non-missing, finite
observations in x
.
positive integer indicating the number of suspected outliers. The argument k
must be between 1 and \(n-2\) where \(n\) denotes the number of non-missing, finite
values in the arguemnt x
. The default value is k=3
.
numeric scalar between 0 and 1 indicating the Type I error associated with the
test of hypothesis. The default value is alpha=0.05
.
logical scalar indicating whether to issue a warning (warn=TRUE
; the default)
when the number of non-missing, finite values in x
and the value of k
are such
that the assumed Type I error level might not be maintained. See the DETAILS section below.
A list of class "gofOutlier"
containing the results of the hypothesis test.
See the help file for gofOutlier.object
for details.
Let \(x_1, x_2, \ldots, x_n\) denote the \(n\) observations. We assume that \(n-k\) of these observations come from the same normal (Gaussian) distribution, and that the \(k\) most “extreme” observations may or may not represent observations from a different distribution. Let \(x^{*}_1, x^{*}_2, \ldots, x^{*}_{n-i}\) denote the \(n-i\) observations left after omiting the \(i\) most extreme observations, where \(i = 0, 1, \ldots, k-1\). Let \(\bar{x}^{(i)}\) and \(s^{(i)}\) denote the mean and standard deviation, respectively, of the \(n-i\) observations in the data that remain after removing the \(i\) most extreme observations. Thus, \(\bar{x}^{(0)}\) and \(s^{(0)}\) denote the mean and standard deviation for the full sample, and in general $$\bar{x}^{(i)} = \frac{1}{n-i}\sum_{j=1}^{n-i} x^{*}_j \;\;\;\;\;\; (1)$$ $$s^{(i)} = \sqrt{\frac{1}{n-i-1} \sum_{j=1}^{n-i} (x^{*}_j - \bar{x}^{(i)})^2} \;\;\;\;\;\; (2)$$
For a specified value of \(i\), the most extreme observation \(x^{(i)}\) is the one that is the greatest distance from the mean for that data set, i.e., $$x^{(i)} = \max_{j=1,2,\ldots,n-i} |x^{*}_j - \bar{x}^{(i)}| \;\;\;\;\;\; (3)$$ Thus, an extreme observation may be the smallest or the largest one in that data set.
Rosner's test is based on the \(k\) statistics \(R_1, R_2, \ldots, R_k\), which represent the extreme Studentized deviates computed from successively reduced samples of size \(n, n-1, \ldots, n-k+1\): $$R_{i+1} = \frac{|x^{(i)} - \bar{x}^{(i)}|}{s^{(i)}} \;\;\;\;\;\; (4)$$ Critical values for \(R_{i+1}\) are denoted \(\lambda_{i+1}\) and are computed as: $$\lambda_{i+1} = \frac{t_{p, n-i-2} (n-i-1)}{\sqrt{(n-i-2 + t_{p, n-i-2}) (n-i)}} \;\;\;\;\;\; (5)$$ where \(t_{p, \nu}\) denotes the \(p\)'th quantile of Student's t-distribution with \(\nu\) degrees of freedom, and in this case $$p = 1 - \frac{\alpha/2}{n - i} \;\;\;\;\;\; (6)$$ where \(\alpha\) denotes the Type I error level.
The algorithm for determining the number of outliers is as follows:
Compare \(R_k\) with \(\lambda_k\). If \(R_k > \lambda_k\) then conclude the \(k\) most extreme values are outliers.
If \(R_k \le \lambda_k\) then compare \(R_{k-1}\) with \(\lambda_{k-1}\). If \(R_{k-1} > \lambda_{k-1}\) then conclude the \(k-1\) most extreme values are outliers.
Continue in this fashion until a certain number of outliers have been identified or Rosner's test finds no outliers at all.
Based on a study using N=1,000 simulations, Rosner's (1983) Table 1 shows the estimated true Type I error of declaring at least one outlier when none exists for various sample sizes \(n\) ranging from 10 to 100, and the declared maximum number of outliers \(k\) ranging from 1 to 10. Based on that table, Roser (1983) declared that for an assumed Type I error level of 0.05, as long as \(n \ge 25\), the estimated \(\alpha\) levels are quite close to 0.05, and that similar results were obtained assuming a Type I error level of 0.01. However, the table below is an expanded version of Rosner's (1983) Table 1 and shows results based on N=10,000 simulations. You can see that for an assumed Type I error of 0.05, the test maintains the Type I error fairly well for sample sizes as small as \(n = 3\) as long as \(k = 1\), and for \(n \ge 15\), as long as \(k \le 2\). Also, for an assumed Type I error of 0.01, the test maintains the Type I error fairly well for sample sizes as small as \(n = 15\) as long as \(k \le 7\).
Based on these results, when warn=TRUE
, a warning is issued for the following cases
indicating that the assumed Type I error may not be correct:
alpha
is greater than 0.01
, the sample size is less than 15, and
k
is greater than 1
.
alpha
is greater than 0.01
,
the sample size is at least 15 and less than 25, and
k
is greater than 2
.
alpha
is less than or equal to 0.01
, the sample size is less than 15, and
k
is greater than 1
.
k
is greater than 10
, or greater than the floor of half of the sample size
(i.e., greater than the greatest integer less than or equal to half of the sample size).
A warning is given for this case because simulations have not been done for this case.
Table 1a. Observed Type I Error Levels based on 10,000 Simulations, \(n =\) 3 to 5.
Assumed | \(\alpha=0.05\) | Assumed | \(\alpha=0.01\) | ||||
\(n\) | \(k\) | \(\hat{\alpha}\) | 95% LCL | 95% UCL | \(\hat{\alpha}\) | 95% LCL | 95% UCL |
3 | 1 | 0.047 | 0.043 | 0.051 | 0.009 | 0.007 | 0.01 |
4 | 1 | 0.049 | 0.045 | 0.053 | 0.010 | 0.008 | 0.012 |
2 | 0.107 | 0.101 | 0.113 | 0.021 | 0.018 | 0.024 | |
5 | 1 | 0.048 | 0.044 | 0.053 | 0.008 | 0.006 | 0.009 |
Table 1b. Observed Type I Error Levels based on 10,000 Simulations, \(n =\) 6 to 10.
Assumed | \(\alpha=0.05\) | Assumed | \(\alpha=0.01\) | ||||
\(n\) | \(k\) | \(\hat{\alpha}\) | 95% LCL | 95% UCL | \(\hat{\alpha}\) | 95% LCL | 95% UCL |
6 | 1 | 0.048 | 0.044 | 0.053 | 0.010 | 0.009 | 0.012 |
2 | 0.085 | 0.080 | 0.091 | 0.017 | 0.015 | 0.020 | |
3 | 0.141 | 0.134 | 0.148 | 0.028 | 0.025 | 0.031 | |
7 | 1 | 0.048 | 0.044 | 0.053 | 0.013 | 0.011 | 0.015 |
2 | 0.080 | 0.075 | 0.086 | 0.017 | 0.015 | 0.020 | |
3 | 0.112 | 0.106 | 0.118 | 0.022 | 0.019 | 0.025 | |
8 | 1 | 0.048 | 0.044 | 0.053 | 0.011 | 0.009 | 0.013 |
2 | 0.080 | 0.074 | 0.085 | 0.017 | 0.014 | 0.019 | |
3 | 0.102 | 0.096 | 0.108 | 0.020 | 0.017 | 0.023 | |
4 | 0.143 | 0.136 | 0.150 | 0.028 | 0.025 | 0.031 | |
9 | 1 | 0.052 | 0.048 | 0.057 | 0.010 | 0.008 | 0.012 |
2 | 0.069 | 0.064 | 0.074 | 0.014 | 0.012 | 0.016 | |
3 | 0.097 | 0.091 | 0.103 | 0.018 | 0.015 | 0.021 | |
4 | 0.120 | 0.114 | 0.126 | 0.024 | 0.021 | 0.027 | |
10 | 1 | 0.051 | 0.047 | 0.056 | 0.010 | 0.008 | 0.012 |
2 | 0.068 | 0.063 | 0.073 | 0.012 | 0.010 | 0.014 | |
3 | 0.085 | 0.080 | 0.091 | 0.015 | 0.013 | 0.017 | |
4 | 0.106 | 0.100 | 0.112 | 0.021 | 0.018 | 0.024 |
Table 1c. Observed Type I Error Levels based on 10,000 Simulations, \(n =\) 11 to 15.
Assumed | \(\alpha=0.05\) | Assumed | \(\alpha=0.01\) | ||||
\(n\) | \(k\) | \(\hat{\alpha}\) | 95% LCL | 95% UCL | \(\hat{\alpha}\) | 95% LCL | 95% UCL |
11 | 1 | 0.052 | 0.048 | 0.056 | 0.012 | 0.010 | 0.014 |
2 | 0.070 | 0.065 | 0.075 | 0.014 | 0.012 | 0.017 | |
3 | 0.082 | 0.077 | 0.088 | 0.014 | 0.011 | 0.016 | |
4 | 0.101 | 0.095 | 0.107 | 0.019 | 0.016 | 0.021 | |
5 | 0.116 | 0.110 | 0.123 | 0.022 | 0.019 | 0.024 | |
12 | 1 | 0.052 | 0.047 | 0.056 | 0.011 | 0.009 | 0.013 |
2 | 0.067 | 0.062 | 0.072 | 0.011 | 0.009 | 0.013 | |
3 | 0.074 | 0.069 | 0.080 | 0.016 | 0.013 | 0.018 | |
4 | 0.088 | 0.082 | 0.093 | 0.016 | 0.014 | 0.019 | |
5 | 0.099 | 0.093 | 0.105 | 0.016 | 0.013 | 0.018 | |
6 | 0.117 | 0.111 | 0.123 | 0.021 | 0.018 | 0.023 | |
13 | 1 | 0.048 | 0.044 | 0.052 | 0.010 | 0.008 | 0.012 |
2 | 0.064 | 0.059 | 0.069 | 0.014 | 0.012 | 0.016 | |
3 | 0.070 | 0.065 | 0.075 | 0.013 | 0.011 | 0.015 | |
4 | 0.079 | 0.074 | 0.084 | 0.014 | 0.012 | 0.017 | |
5 | 0.088 | 0.083 | 0.094 | 0.015 | 0.013 | 0.018 | |
6 | 0.109 | 0.103 | 0.115 | 0.020 | 0.017 | 0.022 | |
14 | 1 | 0.046 | 0.042 | 0.051 | 0.009 | 0.007 | 0.011 |
2 | 0.062 | 0.057 | 0.066 | 0.012 | 0.010 | 0.014 | |
3 | 0.069 | 0.064 | 0.074 | 0.012 | 0.010 | 0.014 | |
4 | 0.077 | 0.072 | 0.082 | 0.015 | 0.013 | 0.018 | |
5 | 0.084 | 0.079 | 0.090 | 0.016 | 0.013 | 0.018 | |
6 | 0.091 | 0.085 | 0.097 | 0.017 | 0.014 | 0.019 | |
7 | 0.107 | 0.101 | 0.113 | 0.018 | 0.016 | 0.021 | |
15 | 1 | 0.054 | 0.050 | 0.059 | 0.010 | 0.008 | 0.012 |
2 | 0.057 | 0.053 | 0.062 | 0.010 | 0.008 | 0.012 | |
3 | 0.065 | 0.060 | 0.069 | 0.013 | 0.011 | 0.016 | |
4 | 0.073 | 0.068 | 0.078 | 0.014 | 0.011 | 0.016 | |
5 | 0.074 | 0.069 | 0.079 | 0.012 | 0.010 | 0.014 | |
6 | 0.086 | 0.081 | 0.092 | 0.015 | 0.013 | 0.017 |
Table 1d. Observed Type I Error Levels based on 10,000 Simulations, \(n =\) 16 to 20.
Assumed | \(\alpha=0.05\) | Assumed | \(\alpha=0.01\) | ||||
\(n\) | \(k\) | \(\hat{\alpha}\) | 95% LCL | 95% UCL | \(\hat{\alpha}\) | 95% LCL | 95% UCL |
16 | 1 | 0.052 | 0.048 | 0.057 | 0.010 | 0.008 | 0.012 |
2 | 0.055 | 0.051 | 0.059 | 0.011 | 0.009 | 0.013 | |
3 | 0.068 | 0.063 | 0.073 | 0.011 | 0.009 | 0.013 | |
4 | 0.074 | 0.069 | 0.079 | 0.015 | 0.013 | 0.017 | |
5 | 0.077 | 0.072 | 0.082 | 0.015 | 0.013 | 0.018 | |
6 | 0.075 | 0.070 | 0.080 | 0.013 | 0.011 | 0.016 | |
7 | 0.087 | 0.082 | 0.093 | 0.017 | 0.014 | 0.020 | |
8 | 0.096 | 0.090 | 0.101 | 0.016 | 0.014 | 0.019 | |
17 | 1 | 0.047 | 0.043 | 0.051 | 0.008 | 0.007 | 0.010 |
2 | 0.059 | 0.054 | 0.063 | 0.011 | 0.009 | 0.013 | |
3 | 0.062 | 0.057 | 0.067 | 0.012 | 0.010 | 0.014 | |
4 | 0.070 | 0.065 | 0.075 | 0.012 | 0.009 | 0.014 | |
5 | 0.069 | 0.064 | 0.074 | 0.012 | 0.010 | 0.015 | |
6 | 0.071 | 0.066 | 0.076 | 0.015 | 0.012 | 0.017 | |
7 | 0.081 | 0.076 | 0.087 | 0.014 | 0.012 | 0.016 | |
8 | 0.083 | 0.078 | 0.088 | 0.015 | 0.013 | 0.017 | |
18 | 1 | 0.051 | 0.047 | 0.055 | 0.010 | 0.008 | 0.012 |
2 | 0.056 | 0.052 | 0.061 | 0.012 | 0.010 | 0.014 | |
3 | 0.065 | 0.060 | 0.070 | 0.012 | 0.010 | 0.015 | |
4 | 0.065 | 0.060 | 0.070 | 0.013 | 0.011 | 0.015 | |
5 | 0.069 | 0.064 | 0.074 | 0.012 | 0.010 | 0.014 | |
6 | 0.068 | 0.063 | 0.073 | 0.014 | 0.011 | 0.016 | |
7 | 0.072 | 0.067 | 0.077 | 0.014 | 0.011 | 0.016 | |
8 | 0.076 | 0.071 | 0.081 | 0.012 | 0.010 | 0.014 | |
9 | 0.081 | 0.076 | 0.086 | 0.012 | 0.010 | 0.014 | |
19 | 1 | 0.051 | 0.046 | 0.055 | 0.008 | 0.006 | 0.010 |
2 | 0.059 | 0.055 | 0.064 | 0.012 | 0.010 | 0.014 | |
3 | 0.059 | 0.054 | 0.064 | 0.011 | 0.009 | 0.013 | |
4 | 0.061 | 0.057 | 0.066 | 0.012 | 0.010 | 0.014 | |
5 | 0.067 | 0.062 | 0.072 | 0.013 | 0.010 | 0.015 | |
6 | 0.066 | 0.061 | 0.071 | 0.011 | 0.009 | 0.013 | |
7 | 0.069 | 0.064 | 0.074 | 0.013 | 0.011 | 0.015 | |
8 | 0.074 | 0.069 | 0.079 | 0.012 | 0.010 | 0.014 | |
9 | 0.082 | 0.077 | 0.087 | 0.015 | 0.013 | 0.018 | |
20 | 1 | 0.053 | 0.048 | 0.057 | 0.011 | 0.009 | 0.013 |
2 | 0.056 | 0.052 | 0.061 | 0.010 | 0.008 | 0.012 | |
3 | 0.060 | 0.056 | 0.065 | 0.009 | 0.007 | 0.011 | |
4 | 0.063 | 0.058 | 0.068 | 0.012 | 0.010 | 0.014 | |
5 | 0.063 | 0.059 | 0.068 | 0.014 | 0.011 | 0.016 | |
6 | 0.063 | 0.058 | 0.067 | 0.011 | 0.009 | 0.013 | |
7 | 0.065 | 0.061 | 0.070 | 0.011 | 0.009 | 0.013 | |
8 | 0.070 | 0.065 | 0.076 | 0.012 | 0.010 | 0.014 | |
9 | 0.076 | 0.070 | 0.081 | 0.013 | 0.011 | 0.015 |
Table 1e. Observed Type I Error Levels based on 10,000 Simulations, \(n =\) 21 to 25.
Assumed | \(\alpha=0.05\) | Assumed | \(\alpha=0.01\) | ||||
\(n\) | \(k\) | \(\hat{\alpha}\) | 95% LCL | 95% UCL | \(\hat{\alpha}\) | 95% LCL | 95% UCL |
21 | 1 | 0.054 | 0.049 | 0.058 | 0.013 | 0.011 | 0.015 |
2 | 0.054 | 0.049 | 0.058 | 0.012 | 0.010 | 0.014 | |
3 | 0.058 | 0.054 | 0.063 | 0.012 | 0.010 | 0.014 | |
4 | 0.058 | 0.054 | 0.063 | 0.011 | 0.009 | 0.013 | |
5 | 0.064 | 0.059 | 0.069 | 0.013 | 0.011 | 0.016 | |
6 | 0.066 | 0.061 | 0.071 | 0.012 | 0.010 | 0.015 | |
7 | 0.063 | 0.058 | 0.068 | 0.013 | 0.011 | 0.015 | |
8 | 0.066 | 0.061 | 0.071 | 0.010 | 0.008 | 0.012 | |
9 | 0.073 | 0.068 | 0.078 | 0.013 | 0.011 | 0.015 | |
10 | 0.071 | 0.066 | 0.076 | 0.012 | 0.010 | 0.014 | |
22 | 1 | 0.047 | 0.042 | 0.051 | 0.010 | 0.008 | 0.012 |
2 | 0.058 | 0.053 | 0.062 | 0.012 | 0.010 | 0.015 | |
3 | 0.056 | 0.052 | 0.061 | 0.010 | 0.008 | 0.012 | |
4 | 0.059 | 0.055 | 0.064 | 0.012 | 0.010 | 0.014 | |
5 | 0.061 | 0.057 | 0.066 | 0.009 | 0.008 | 0.011 | |
6 | 0.063 | 0.058 | 0.068 | 0.013 | 0.010 | 0.015 | |
7 | 0.065 | 0.060 | 0.070 | 0.013 | 0.010 | 0.015 | |
8 | 0.065 | 0.060 | 0.070 | 0.014 | 0.012 | 0.016 | |
9 | 0.065 | 0.060 | 0.070 | 0.012 | 0.010 | 0.014 | |
10 | 0.067 | 0.062 | 0.072 | 0.012 | 0.009 | 0.014 | |
23 | 1 | 0.051 | 0.047 | 0.056 | 0.008 | 0.007 | 0.010 |
2 | 0.056 | 0.052 | 0.061 | 0.010 | 0.009 | 0.012 | |
3 | 0.056 | 0.052 | 0.061 | 0.011 | 0.009 | 0.013 | |
4 | 0.062 | 0.057 | 0.066 | 0.011 | 0.009 | 0.013 | |
5 | 0.061 | 0.056 | 0.065 | 0.010 | 0.009 | 0.012 | |
6 | 0.060 | 0.055 | 0.064 | 0.012 | 0.010 | 0.014 | |
7 | 0.062 | 0.057 | 0.066 | 0.011 | 0.009 | 0.013 | |
8 | 0.063 | 0.058 | 0.068 | 0.012 | 0.010 | 0.014 | |
9 | 0.066 | 0.061 | 0.071 | 0.012 | 0.010 | 0.014 | |
10 | 0.068 | 0.063 | 0.073 | 0.014 | 0.012 | 0.017 | |
24 | 1 | 0.051 | 0.046 | 0.055 | 0.010 | 0.008 | 0.012 |
2 | 0.056 | 0.051 | 0.060 | 0.011 | 0.009 | 0.013 | |
3 | 0.058 | 0.053 | 0.062 | 0.010 | 0.008 | 0.012 | |
4 | 0.060 | 0.056 | 0.065 | 0.013 | 0.011 | 0.015 | |
5 | 0.057 | 0.053 | 0.062 | 0.012 | 0.010 | 0.014 | |
6 | 0.065 | 0.060 | 0.069 | 0.011 | 0.009 | 0.013 | |
7 | 0.062 | 0.057 | 0.066 | 0.012 | 0.010 | 0.014 | |
8 | 0.060 | 0.055 | 0.065 | 0.012 | 0.010 | 0.014 | |
9 | 0.066 | 0.061 | 0.071 | 0.012 | 0.010 | 0.014 | |
10 | 0.064 | 0.059 | 0.068 | 0.012 | 0.010 | 0.015 | |
25 | 1 | 0.054 | 0.050 | 0.059 | 0.012 | 0.009 | 0.014 |
2 | 0.055 | 0.051 | 0.060 | 0.010 | 0.008 | 0.012 | |
3 | 0.057 | 0.052 | 0.062 | 0.011 | 0.009 | 0.013 | |
4 | 0.055 | 0.051 | 0.060 | 0.011 | 0.009 | 0.013 | |
5 | 0.060 | 0.055 | 0.065 | 0.012 | 0.010 | 0.014 | |
6 | 0.060 | 0.055 | 0.064 | 0.011 | 0.009 | 0.013 | |
7 | 0.057 | 0.052 | 0.061 | 0.011 | 0.009 | 0.013 | |
8 | 0.062 | 0.058 | 0.067 | 0.011 | 0.009 | 0.013 | |
9 | 0.058 | 0.053 | 0.062 | 0.012 | 0.010 | 0.014 |
Table 1f. Observed Type I Error Levels based on 10,000 Simulations, \(n =\) 26 to 30.
Assumed | \(\alpha=0.05\) | Assumed | \(\alpha=0.01\) | ||||
\(n\) | \(k\) | \(\hat{\alpha}\) | 95% LCL | 95% UCL | \(\hat{\alpha}\) | 95% LCL | 95% UCL |
26 | 1 | 0.051 | 0.047 | 0.055 | 0.012 | 0.010 | 0.014 |
2 | 0.057 | 0.053 | 0.062 | 0.013 | 0.011 | 0.015 | |
3 | 0.055 | 0.050 | 0.059 | 0.012 | 0.010 | 0.014 | |
4 | 0.055 | 0.051 | 0.060 | 0.010 | 0.008 | 0.012 | |
5 | 0.058 | 0.054 | 0.063 | 0.011 | 0.009 | 0.013 | |
6 | 0.061 | 0.056 | 0.066 | 0.012 | 0.010 | 0.014 | |
7 | 0.059 | 0.054 | 0.064 | 0.011 | 0.009 | 0.013 | |
8 | 0.060 | 0.056 | 0.065 | 0.010 | 0.008 | 0.012 | |
9 | 0.060 | 0.056 | 0.065 | 0.011 | 0.009 | 0.013 | |
10 | 0.061 | 0.056 | 0.065 | 0.011 | 0.009 | 0.013 | |
27 | 1 | 0.050 | 0.046 | 0.054 | 0.009 | 0.007 | 0.011 |
2 | 0.054 | 0.050 | 0.059 | 0.011 | 0.009 | 0.013 | |
3 | 0.062 | 0.057 | 0.066 | 0.012 | 0.010 | 0.014 | |
4 | 0.063 | 0.058 | 0.068 | 0.011 | 0.009 | 0.013 | |
5 | 0.051 | 0.047 | 0.055 | 0.010 | 0.008 | 0.012 | |
6 | 0.058 | 0.053 | 0.062 | 0.011 | 0.009 | 0.013 | |
7 | 0.060 | 0.056 | 0.065 | 0.010 | 0.008 | 0.012 | |
8 | 0.056 | 0.052 | 0.061 | 0.010 | 0.008 | 0.012 | |
9 | 0.061 | 0.056 | 0.066 | 0.012 | 0.010 | 0.014 | |
10 | 0.055 | 0.051 | 0.060 | 0.008 | 0.006 | 0.010 | |
28 | 1 | 0.049 | 0.045 | 0.053 | 0.010 | 0.008 | 0.011 |
2 | 0.057 | 0.052 | 0.061 | 0.011 | 0.009 | 0.013 | |
3 | 0.056 | 0.052 | 0.061 | 0.012 | 0.009 | 0.014 | |
4 | 0.057 | 0.053 | 0.062 | 0.011 | 0.009 | 0.013 | |
5 | 0.057 | 0.053 | 0.062 | 0.010 | 0.008 | 0.012 | |
6 | 0.056 | 0.051 | 0.060 | 0.010 | 0.008 | 0.012 | |
7 | 0.057 | 0.052 | 0.061 | 0.010 | 0.008 | 0.012 | |
8 | 0.058 | 0.054 | 0.063 | 0.011 | 0.009 | 0.013 | |
9 | 0.054 | 0.050 | 0.058 | 0.011 | 0.009 | 0.013 | |
10 | 0.062 | 0.057 | 0.067 | 0.011 | 0.009 | 0.013 | |
29 | 1 | 0.049 | 0.045 | 0.053 | 0.011 | 0.009 | 0.013 |
2 | 0.053 | 0.048 | 0.057 | 0.010 | 0.008 | 0.012 | |
3 | 0.056 | 0.051 | 0.060 | 0.010 | 0.009 | 0.012 | |
4 | 0.055 | 0.050 | 0.059 | 0.010 | 0.008 | 0.012 | |
5 | 0.056 | 0.051 | 0.060 | 0.010 | 0.008 | 0.012 | |
6 | 0.057 | 0.053 | 0.062 | 0.012 | 0.010 | 0.014 | |
7 | 0.055 | 0.050 | 0.059 | 0.010 | 0.008 | 0.012 | |
8 | 0.057 | 0.052 | 0.061 | 0.011 | 0.009 | 0.013 | |
9 | 0.056 | 0.051 | 0.061 | 0.011 | 0.009 | 0.013 | |
10 | 0.057 | 0.052 | 0.061 | 0.011 | 0.009 | 0.013 | |
30 | 1 | 0.050 | 0.046 | 0.054 | 0.009 | 0.007 | 0.011 |
2 | 0.054 | 0.049 | 0.058 | 0.011 | 0.009 | 0.013 | |
3 | 0.056 | 0.052 | 0.061 | 0.012 | 0.010 | 0.015 | |
4 | 0.054 | 0.049 | 0.058 | 0.010 | 0.008 | 0.012 | |
5 | 0.058 | 0.053 | 0.063 | 0.012 | 0.010 | 0.014 | |
6 | 0.062 | 0.058 | 0.067 | 0.012 | 0.010 | 0.014 | |
7 | 0.056 | 0.052 | 0.061 | 0.012 | 0.010 | 0.014 | |
8 | 0.059 | 0.054 | 0.064 | 0.011 | 0.009 | 0.013 | |
9 | 0.056 | 0.052 | 0.061 | 0.010 | 0.009 | 0.012 |
Table 1g. Observed Type I Error Levels based on 10,000 Simulations, n = 31 to 35.
Assumed | \(\alpha=0.05\) | Assumed | \(\alpha=0.01\) | ||||
\(n\) | \(k\) | \(\hat{\alpha}\) | 95% LCL | 95% UCL | \(\hat{\alpha}\) | 95% LCL | 95% UCL |
31 | 1 | 0.051 | 0.047 | 0.056 | 0.009 | 0.007 | 0.011 |
2 | 0.054 | 0.050 | 0.059 | 0.010 | 0.009 | 0.012 | |
3 | 0.053 | 0.049 | 0.058 | 0.010 | 0.008 | 0.012 | |
4 | 0.055 | 0.050 | 0.059 | 0.010 | 0.008 | 0.012 | |
5 | 0.053 | 0.049 | 0.057 | 0.011 | 0.009 | 0.013 | |
6 | 0.055 | 0.050 | 0.059 | 0.010 | 0.008 | 0.012 | |
7 | 0.055 | 0.050 | 0.059 | 0.012 | 0.010 | 0.014 | |
8 | 0.056 | 0.051 | 0.060 | 0.010 | 0.008 | 0.012 | |
9 | 0.057 | 0.053 | 0.062 | 0.011 | 0.009 | 0.013 | |
10 | 0.058 | 0.053 | 0.062 | 0.011 | 0.009 | 0.013 | |
32 | 1 | 0.054 | 0.049 | 0.058 | 0.010 | 0.008 | 0.012 |
2 | 0.054 | 0.050 | 0.059 | 0.010 | 0.008 | 0.012 | |
3 | 0.052 | 0.047 | 0.056 | 0.009 | 0.007 | 0.011 | |
4 | 0.056 | 0.051 | 0.060 | 0.011 | 0.009 | 0.013 | |
5 | 0.056 | 0.052 | 0.061 | 0.011 | 0.009 | 0.013 | |
6 | 0.055 | 0.051 | 0.060 | 0.011 | 0.009 | 0.013 | |
7 | 0.055 | 0.051 | 0.060 | 0.010 | 0.008 | 0.012 | |
8 | 0.055 | 0.051 | 0.060 | 0.010 | 0.008 | 0.012 | |
9 | 0.057 | 0.053 | 0.062 | 0.012 | 0.010 | 0.014 | |
10 | 0.054 | 0.050 | 0.059 | 0.010 | 0.008 | 0.012 | |
33 | 1 | 0.051 | 0.046 | 0.055 | 0.011 | 0.009 | 0.013 |
2 | 0.055 | 0.051 | 0.060 | 0.011 | 0.009 | 0.013 | |
3 | 0.056 | 0.052 | 0.061 | 0.010 | 0.008 | 0.012 | |
4 | 0.052 | 0.048 | 0.057 | 0.010 | 0.008 | 0.012 | |
5 | 0.055 | 0.050 | 0.059 | 0.010 | 0.008 | 0.012 | |
6 | 0.058 | 0.053 | 0.062 | 0.011 | 0.009 | 0.013 | |
7 | 0.057 | 0.052 | 0.061 | 0.010 | 0.008 | 0.012 | |
8 | 0.058 | 0.054 | 0.063 | 0.011 | 0.009 | 0.013 | |
9 | 0.057 | 0.053 | 0.062 | 0.012 | 0.010 | 0.014 | |
10 | 0.055 | 0.051 | 0.060 | 0.011 | 0.009 | 0.013 | |
34 | 1 | 0.052 | 0.048 | 0.056 | 0.009 | 0.007 | 0.011 |
2 | 0.053 | 0.049 | 0.058 | 0.011 | 0.009 | 0.013 | |
3 | 0.055 | 0.050 | 0.059 | 0.012 | 0.010 | 0.014 | |
4 | 0.056 | 0.052 | 0.061 | 0.010 | 0.008 | 0.012 | |
5 | 0.053 | 0.048 | 0.057 | 0.009 | 0.007 | 0.011 | |
6 | 0.055 | 0.050 | 0.059 | 0.010 | 0.008 | 0.012 | |
7 | 0.052 | 0.048 | 0.057 | 0.012 | 0.010 | 0.014 | |
8 | 0.055 | 0.050 | 0.059 | 0.009 | 0.008 | 0.011 | |
9 | 0.055 | 0.051 | 0.060 | 0.011 | 0.009 | 0.013 | |
10 | 0.054 | 0.049 | 0.058 | 0.010 | 0.008 | 0.012 | |
35 | 1 | 0.051 | 0.046 | 0.055 | 0.010 | 0.009 | 0.012 |
2 | 0.054 | 0.049 | 0.058 | 0.010 | 0.009 | 0.012 | |
3 | 0.055 | 0.050 | 0.059 | 0.010 | 0.009 | 0.012 | |
4 | 0.053 | 0.048 | 0.057 | 0.011 | 0.009 | 0.013 | |
5 | 0.056 | 0.051 | 0.061 | 0.011 | 0.009 | 0.013 | |
6 | 0.055 | 0.051 | 0.059 | 0.012 | 0.010 | 0.014 | |
7 | 0.054 | 0.050 | 0.059 | 0.011 | 0.009 | 0.013 | |
8 | 0.054 | 0.049 | 0.058 | 0.011 | 0.009 | 0.013 | |
9 | 0.061 | 0.056 | 0.066 | 0.012 | 0.010 | 0.014 |
Table 1h. Observed Type I Error Levels based on 10,000 Simulations, n = 36 to 40.
Assumed | \(\alpha=0.05\) | Assumed | \(\alpha=0.01\) | ||||
\(n\) | \(k\) | \(\hat{\alpha}\) | 95% LCL | 95% UCL | \(\hat{\alpha}\) | 95% LCL | 95% UCL |
36 | 1 | 0.047 | 0.043 | 0.051 | 0.010 | 0.008 | 0.012 |
2 | 0.058 | 0.053 | 0.062 | 0.012 | 0.010 | 0.015 | |
3 | 0.052 | 0.047 | 0.056 | 0.009 | 0.007 | 0.011 | |
4 | 0.052 | 0.048 | 0.056 | 0.012 | 0.010 | 0.014 | |
5 | 0.052 | 0.048 | 0.057 | 0.010 | 0.008 | 0.012 | |
6 | 0.055 | 0.051 | 0.059 | 0.012 | 0.010 | 0.014 | |
7 | 0.053 | 0.048 | 0.057 | 0.011 | 0.009 | 0.013 | |
8 | 0.056 | 0.051 | 0.060 | 0.012 | 0.010 | 0.014 | |
9 | 0.056 | 0.051 | 0.060 | 0.011 | 0.009 | 0.013 | |
10 | 0.056 | 0.051 | 0.060 | 0.011 | 0.009 | 0.013 | |
37 | 1 | 0.050 | 0.046 | 0.055 | 0.010 | 0.008 | 0.012 |
2 | 0.054 | 0.049 | 0.058 | 0.011 | 0.009 | 0.013 | |
3 | 0.054 | 0.049 | 0.058 | 0.011 | 0.009 | 0.013 | |
4 | 0.054 | 0.050 | 0.058 | 0.010 | 0.008 | 0.012 | |
5 | 0.054 | 0.049 | 0.058 | 0.010 | 0.008 | 0.012 | |
6 | 0.054 | 0.050 | 0.058 | 0.011 |
Barnett, V., and T. Lewis. (1995). Outliers in Statistical Data. Third Edition. John Wiley & Sons, Chichester, UK, pp. 235--236.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, NY, pp.188--191.
McBean, E.A, and F.A. Rovers. (1992). Estimation of the Probability of Exceedance of Contaminant Concentrations. Ground Water Monitoring Review Winter, pp. 115--119.
McNutt, M. (2014). Raising the Bar. Science 345(6192), p. 9.
Rosner, B. (1975). On the Detection of Many Outliers. Technometrics 17, 221--227.
Rosner, B. (1983). Percentage Points for a Generalized ESD Many-Outlier Procedure. Technometrics 25, 165--172.
USEPA. (2006). Data Quality Assessment: A Reviewer's Guide. EPA QA/G-9R. EPA/240/B-06/002, February 2006. Office of Environmental Information, U.S. Environmental Protection Agency, Washington, D.C.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C., pp. 12-10 to 12-14.
USEPA. (2013a). ProUCL Version 5.0.00 Technical Guide. EPA/600/R-07/041, September 2013. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C., pp. 190--195.
USEPA. (2013b). ProUCL Version 5.0.00 User Guide. EPA/600/R-07/041, September 2013. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C., pp. 190--195.
gofTest
, gofOutlier.object
, print.gofOutlier
,
Normal, qqPlot
.
# NOT RUN {
# Combine 30 observations from a normal distribution with mean 3 and
# standard deviation 2, with 3 observations from a normal distribution
# with mean 10 and standard deviation 1, then run Rosner's Test on these
# data, specifying k=4 potential outliers based on looking at the
# normal Q-Q plot.
# (Note: the call to set.seed simply allows you to reproduce
# this example.)
set.seed(250)
dat <- c(rnorm(30, mean = 3, sd = 2), rnorm(3, mean = 10, sd = 1))
dev.new()
qqPlot(dat)
rosnerTest(dat, k = 4)
#Results of Outlier Test
#-------------------------
#
#Test Method: Rosner's Test for Outliers
#
#Hypothesized Distribution: Normal
#
#Data: dat
#
#Sample Size: 33
#
#Test Statistics: R.1 = 2.848514
# R.2 = 3.086875
# R.3 = 3.033044
# R.4 = 2.380235
#
#Test Statistic Parameter: k = 4
#
#Alternative Hypothesis: Up to 4 observations are not
# from the same Distribution.
#
#Type I Error: 5%
#
#Number of Outliers Detected: 3
#
# i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
#1 0 3.549744 2.531011 10.7593656 33 2.848514 2.951949 TRUE
#2 1 3.324444 2.209872 10.1460427 31 3.086875 2.938048 TRUE
#3 2 3.104392 1.856109 8.7340527 32 3.033044 2.923571 TRUE
#4 3 2.916737 1.560335 -0.7972275 25 2.380235 2.908473 FALSE
#----------
# Clean up
rm(dat)
graphics.off()
#--------------------------------------------------------------------
# Example 12-4 of USEPA (2009, page 12-12) gives an example of
# using Rosner's test to test for outliers in napthalene measurements (ppb)
# taken at 5 background wells over 5 quarters. The data for this example
# are stored in EPA.09.Ex.12.4.naphthalene.df.
EPA.09.Ex.12.4.naphthalene.df
# Quarter Well Naphthalene.ppb
#1 1 BW.1 3.34
#2 2 BW.1 5.39
#3 3 BW.1 5.74
# ...
#23 3 BW.5 5.53
#24 4 BW.5 4.42
#25 5 BW.5 35.45
longToWide(EPA.09.Ex.12.4.naphthalene.df, "Naphthalene.ppb", "Quarter", "Well",
paste.row.name = TRUE)
# BW.1 BW.2 BW.3 BW.4 BW.5
#Quarter.1 3.34 5.59 1.91 6.12 8.64
#Quarter.2 5.39 5.96 1.74 6.05 5.34
#Quarter.3 5.74 1.47 23.23 5.18 5.53
#Quarter.4 6.88 2.57 1.82 4.43 4.42
#Quarter.5 5.85 5.39 2.02 1.00 35.45
# Look at Q-Q plots for both the raw and log-transformed data
#------------------------------------------------------------
dev.new()
with(EPA.09.Ex.12.4.naphthalene.df,
qqPlot(Naphthalene.ppb, add.line = TRUE,
main = "Figure 12-6. Naphthalene Probability Plot"))
dev.new()
with(EPA.09.Ex.12.4.naphthalene.df,
qqPlot(Naphthalene.ppb, dist = "lnorm", add.line = TRUE,
main = "Figure 12-7. Log Naphthalene Probability Plot"))
# Test for 2 potential outliers on the original scale:
#-----------------------------------------------------
with(EPA.09.Ex.12.4.naphthalene.df, rosnerTest(Naphthalene.ppb, k = 2))
#Results of Outlier Test
#-------------------------
#
#Test Method: Rosner's Test for Outliers
#
#Hypothesized Distribution: Normal
#
#Data: Naphthalene.ppb
#
#Sample Size: 25
#
#Test Statistics: R.1 = 3.930957
# R.2 = 4.160223
#
#Test Statistic Parameter: k = 2
#
#Alternative Hypothesis: Up to 2 observations are not
# from the same Distribution.
#
#Type I Error: 5%
#
#Number of Outliers Detected: 2
#
# i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
#1 0 6.44240 7.379271 35.45 25 3.930957 2.821681 TRUE
#2 1 5.23375 4.325790 23.23 13 4.160223 2.801551 TRUE
#----------
# Clean up
graphics.off()
# }
Run the code above in your browser using DataLab