The functions implements the Wilson (1993) outlier detection method. One written entirely in R and another written in C++.
outlier.ap (X, Y, NDEL = 3, NLEN = 25, TRANSPOSE = FALSE)
outlierC.ap(X, Y, NDEL = 3, NLEN = 25, TRANSPOSE = FALSE)outlier.ap.plot(ratio, NLEN = 25, xlab = "Number of firms deleted",
ylab = "Log ratio", ..., ylim)
A min(NLEN,K) x NDEL
matrix with the log-ratios to
be plotted.
A NDEL x NDEL
matrix with indicies for deleted firms.
A NDEL
array with the minimum value \(R^{i}\) of
the for each number of deleted firms.
Input as a firms times goods matrix, see TRANSPOSE
.
Output as a firms times goods matrix, see
TRANSPOSE
.
The maximum number of firms to be considered as a group of outliers, i.e. the maximum number of firms to be deleted.
The number of ratios to save for each level
or removal, the number of rows in ratio
used.
Input and output matrices are treated as firms
times goods matrices for the default value TRANSPOSE=FALSE
corresponding to the standard in R for statistical models. When
TRUE
data matrices are transposed to good times firms
matrices as is normally used in LP formulation of the problem.
The ratio
component from the list as output
from outlier.ap
.
Label for the x-axis.
Label for the y-axis
The y limits (y1, y2)
of the plot, an
array/vector of length 2.
Usual options for the methods plot
and
lines
.
Peter Bogetoft and Lars Otto larsot23@gmail.com
An implementation of the method in Wilson (1993) using only R
functions and especially the function det
to calculate
\(R^{(i)}_{\min}\). The alternative method outlierC.ap
is written completely in C++ and is much faster, but still not
as fast at the method in FEAR.
An elementary presentation of the method is found in Bogetoft and Otto (2011), Sect. 5.13 on outliers.
For a data set with 10 firms and considering at the most 3 outliers there are
175 combinations of firms to delete. For 100 firms there are 166,750
combinations and for at most 5 outliers there are 79,375,495 combinatins, for
at most 8 outliers there are 203,366,882,995 combinations. For 200 firms whith
respectively 3,5 and 8 outliers there are 1,333,500, and 2,601,668,490, and
a number we do not know what to call
57,467,902,686,615 combinations. Thus the number of combinations are increasing
exponentialy in both number of firms and number of firms to be deleted and so
is the computational time. Thus you should limit the numbers NDEL
to a
very small number like at the most 3 or perhabs 5 depending of the number
of firms. Or you should use the extremely fast method ap
from the
package FEAR mentioned in the references.
Bogetoft and Otto; Benchmarking with DEA, SFA, and R; Springer 2011
Wilson (1993), “Detecing outliers in deterministic nonparametric frontier models with multiple outputs,” Journal of Business and Economic Statistics 11, 319-323.
Wilson (2008), “FEAR 1.0: A Software Package for Frontier Efficiency Analysis with R,” Socio-Economic Planning Sciences 42, 247--254
The function ap
in the package FEAR.
n <- 25
x <- matrix(rnorm(n))
y <- .5 + 2.5*x + 2*rnorm(25)
tap <- outlier.ap(x,y, NDEL=2)
print(cbind(tap$imat,tap$rmin), na.print="", digit=2)
outlier.ap.plot(tap$ratio)
Run the code above in your browser using DataLab