errorStats: Compute error components of k-NN imputations

Description

Error properties of estimates derived from imputation differ from those of regression-based estimates because the two methods include a different mix of error components. This function computes a partitioning of error statistics as proposed by Stage and Crookston (2007).

Usage

errorStats(mahal,...,scale=FALSE,pzero=0.1,plg=0.5,seeMethod="lm")

Value

A list that contains several data frames. The column names of each are a combination of the name of the object used to compute the statistics and the name of the statistic. The rownames correspond the the Y-variables from the first argument. The data frame names are as follows:

common: statistics used to compute other statistics.
name of first argument: error statistics for the first yai object.
names of ... arguments: error statistics for each of the remaining yai objects, if any.
see: standard error of estimate for individual regressions fit for corresponding Y-variables.
rmmsd0: root mean square difference for imputations based on method="mahalanobis" (always based on the first argument to the function).
mlf: square root of the model lack of fit: \(sqrt(see^2 - (rmmsd0^2/2))\).
rmsd: root mean square error.
rmsdlg: root mean square error of the observations with larger distances.
sei: standard error of imputation \(sqrt(rmsd^2 - (rmmsd0^2/2))\).
dstc: distance component: \(sqrt(rmsd^2 - rmmsd0^2)\).

Note that unlike Stage and Crookston (2007), all statistics reported here are in the natural units, not squared units.

Arguments

mahal: An object of class yai computed with method="mahalanobis".
...: Other objects of class yai for which statistics are desired. All objects should be for the same data and variables used for the first argument.
scale: When TRUE, the errors are scaled by their respective standard deviations.
pzero: The lower tail p-value used to pick reference observations that are zero distance from each other (used to compute rmmsd0).
plg: The upper tail p-value used to pick reference observations that are substantially distant from each other (used to compute rmsdlg).
seeMethod: Method used to compute SEE: seeMethod="lm" uses lm and seeMethod="gam" uses gam. In both cases, the model formula is a simple linear combination of the X-variables.

Author

Nicholas L. Crookston ncrookston.fs@gmail.com
Albert R. Stage

Details

See https://academic.oup.com/forestscience/article/53/1/62/4604364

References

Stage, A.R.; Crookston, N.L. (2007). Partitioning error components for accuracy-assessment of near neighbor methods of imputation. For. Sci. 53(1):62-72. https://academic.oup.com/forestscience/article/53/1/62/4604364

Examples

Run this code


require (yaImpute)

data(TallyLake)

diag(cov(TallyLake[,1:8])) # see col A in Table 3 in Stage and Crookston

mal=yai(x=TallyLake[,9:29],y=TallyLake[,1:8],
        noTrgs=TRUE,method="mahalanobis")


msn=yai(x=TallyLake[,9:29],y=TallyLake[,1:8],
        noTrgs=TRUE,method="msn")


# variable "see" for "mal" matches col B (when squared and scaled)
# other columns don't match exactly as Stage and Crookston used different
# software to compute values

errorStats(mal,msn)

Run the code above in your browser using DataLab