LCS: LCS: Largest consistent subset

Description

Calculates a 'largest consistent subset' given values and associated uncertainty information.

Usage

LCS(x, u, p = 0.05, method = "enum", simplify = FALSE, 
	verbose = FALSE)

Value

If there is only one subset of maximum size, or if simplify=TRUE, a vector of indices for x representing the largest consistent subset.

If there is more than one subset of maximum size and simplify=FALSE, a matrix of indices in which the rows contain the indices of each subset.

Arguments

x: Vector of observations.
u: Vector of standard errors or standard uncertainties associated with x.
p: Significance level at which consistency is tested.
method: Subset identification method. Currently only 'enum' is supported.
simplify: If simplify is TRUE, only the lowest-uncertainty subset is returned even if several are of the same size.
verbose: Logical: Controls the level of reporting during the search.

Author

S. Ellison s.ellison@lgc.co.uk

Warning

LCS methods are essentially equivalent to unsupervised outlier rejection. In general, this results in a possibly extreme low estimated variance for an arbitrarily small subset (in the limit of gross inconsistency, LCS will return subsets of size 1). The estimated uncertainty calculated for the Graybill-Deal weighted mean of the subset(s) does not generally take account of the subset selection process or the dispersion of the complete data set, so is not an estimate of sampling variance.

LCS is therefore not recommended for consensus value estimation. It is however, quite useful for identifying value/uncertainty outliers.

Details

LCS obtains the largest subset(s) of x which pass a chi-squared test for consistency, taking the uncertainties u into account.

method controls the search method used. Method "enum" uses complete enumeration of all subsets of size n, starting at n==length(x) and decreasing n until at least one consistent subset is found. No other method is currently supported; if a different method is specified, LCS provides a warning and continues with "enum".

There may be more than on consistent subset of size n. If so, LCS returns all such subsets unless simplify is TRUE, in which case LCS prints a short warning and returns the subset with smallest estimated uncertainty as estimated for the Graybill-Deal weighted mean assuming large degrees of freedom in u.

verbose controls the level of reporting. If TRUE, LCS prints the progress of the search.

The general idea of a Largest Consistent Subset as implemented here was suggested by Cox (2006), though at least one other related method has been suggested by Heydorn (2006). It has, however, been criticised as an estimator (Toman and Possolo (2009)) ; see Warning below.

References

Cox, M. G. (2007) The evaluation of key comparison data: determining the largest consistent subset. Metrologia 44, 187-200 (2007)

Heydorn, K. (2006) The determination of an accepted reference value from proficiency data with stated uncertainties. Accred Qual Assur 10, 479-484 (2006)

Toman, B. and Possolo, A. (2009) Laboratory effects models for interlaboratory comparisons. Accred. Qual. Assur. 14, 553-563 (2009)

Examples

Run this code

data(Pb)
with(Pb, LCS(value, U/k))

Run the code above in your browser using DataLab