outlier.ap: Detection of outliers in benchmark models

Description

The functions implements the Wilson (1993) outlier detection method. One written entirely in R and another written in C++.

Usage

outlier.ap (X, Y, NDEL = 3, NLEN = 25, TRANSPOSE = FALSE)
outlierC.ap(X, Y, NDEL = 3, NLEN = 25, TRANSPOSE = FALSE)
outlier.ap.plot(ratio, NLEN = 25, xlab = "Number of firms deleted", 
                ylab = "Log ratio", ..., ylim)

Value

ratio: A min(NLEN,K) x NDEL matrix with the log-ratios to be plotted.
imat: A NDEL x NDEL matrix with indicies for deleted firms.
r0: A NDEL array with the minimum value \(R^{i}\) of the for each number of deleted firms.

Arguments

X: Input as a firms times goods matrix, see TRANSPOSE.
Y: Output as a firms times goods matrix, see TRANSPOSE.
NDEL: The maximum number of firms to be considered as a group of outliers, i.e. the maximum number of firms to be deleted.
NLEN: The number of ratios to save for each level or removal, the number of rows in ratio used.
TRANSPOSE: Input and output matrices are treated as firms times goods matrices for the default value TRANSPOSE=FALSE corresponding to the standard in R for statistical models. When TRUE data matrices are transposed to good times firms matrices as is normally used in LP formulation of the problem.
ratio: The ratio component from the list as output from outlier.ap.
xlab: Label for the x-axis.
ylab: Label for the y-axis
ylim: The y limits (y1, y2) of the plot, an array/vector of length 2.
...: Usual options for the methods plot and lines.

Author

Peter Bogetoft and Lars Otto larsot23@gmail.com

Details

An implementation of the method in Wilson (1993) using only R functions and especially the function det to calculate \(R^{(i)}_{\min}\). The alternative method outlierC.ap is written completely in C++ and is much faster, but still not as fast at the method in FEAR.

An elementary presentation of the method is found in Bogetoft and Otto (2011), Sect. 5.13 on outliers.

For a data set with 10 firms and considering at the most 3 outliers there are 175 combinations of firms to delete. For 100 firms there are 166,750 combinations and for at most 5 outliers there are 79,375,495 combinatins, for at most 8 outliers there are 203,366,882,995 combinations. For 200 firms whith respectively 3,5 and 8 outliers there are 1,333,500, and 2,601,668,490, and a number we do not know what to call 57,467,902,686,615 combinations. Thus the number of combinations are increasing exponentialy in both number of firms and number of firms to be deleted and so is the computational time. Thus you should limit the numbers NDEL to a very small number like at the most 3 or perhabs 5 depending of the number of firms. Or you should use the extremely fast method ap from the package FEAR mentioned in the references.

References

Bogetoft and Otto; Benchmarking with DEA, SFA, and R; Springer 2011

Wilson (1993), “Detecing outliers in deterministic nonparametric frontier models with multiple outputs,” Journal of Business and Economic Statistics 11, 319-323.

Wilson (2008), “FEAR 1.0: A Software Package for Frontier Efficiency Analysis with R,” Socio-Economic Planning Sciences 42, 247--254

Examples

Run this code

n <- 25
x <- matrix(rnorm(n))
y <- .5 + 2.5*x + 2*rnorm(25)
tap <- outlier.ap(x,y, NDEL=2)
print(cbind(tap$imat,tap$rmin), na.print="", digit=2)
outlier.ap.plot(tap$ratio)

Run the code above in your browser using DataLab