Learn R Programming

adhoc (version 1.1)

adhocTHR: Calculation of an ad hoc distance threshold for DNA barcoding identification

Description

This function utilises the output of checkDNAbcd to quantify the relative identification errors obtained for a particular reference library at set of arbitrary distance thresholds. It then performs linear (or polynomial) regression and calculates the ad hoc distance threshold corresponding to an expected relative identification error. The method is described by Virgilio et al. (2012).

Usage

adhocTHR(a, NbrTh = 30, ErrProb = 0.05, Ambig = "incorrect", Reg = "linear")

Arguments

a
the output of the function checkDNAbcd.
NbrTh
the number of arbitrary distance thresholds used for the fitting (30 by default).
ErrProb
a value between 0 and 1 setting the relative identification error probability (5% by default).
Ambig
"incorrect", "correct" or "ignore" to treat ambiguous identifications as incorrect, correct, or to ignore them (default is "incorrect").
Reg
"linear" or " polynomial" to perform a linear or a polynomial fitting (default is linear).

Value

"adhocTHR" returns a list of 7 components:
BM
a data.frame describing the best matches obtained for each query: distance (distBM), identification (idBM), evaluation of the identification (IDeval) according to Sonet et al. (2013).
IDcheck
a data.frame describing the identification success at each arbitrary threshold: numbers of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN), the relative identification error (RE), the overall identification error (OE), the accuracy and the precision (Sonet et al. 2013).
reg
an object of class "lm" describing the regression performed by "adhocTHR".
ErrProb
the value of the relative identification error probability fixed as an argument by the user (default is 5%).
THR
the value of the ad hoc distance threshold producing the desired probability (ErrProb) of relative identification error.
redflagged
lists all ambiguous matches obtained for each ambiguous identification: the number of species involved in the ambiguous identification (Nb_species), the number of reference sequences with the same species name as the query (Nb_conspecific_seq), the number of reference sequences with a different species name than the query (Nb_heteroallospecific_seq) and the labels of all best matches involved in the ambiguous identification (List_of_best-matches).
redflaggedSP
lists all species names that are involved in the same ambiguous identification.

Details

NbrTh: By default, the function samples 30 distance thresholds equidistantly distributed between 0.0 and the largest distance between all pairs of query - best match observed in the reference library. Using more arbitrary distance thresholds may improve the fitting but will also require more computing time. ErrProb: By default, the function sets an estimated error probability of 5%. In some cases, the selected relative identification error (e.g. 5%) cannot be reached, even when using the most restrictive distance threshold (viz. when setting distance threshold = 0.00). See below. Ambig: We recommend to use this option with caution. If "correct", it will treat all red-flagged species involved in the same ambiguous identification as a single species. Setting this option to "ignore" will allow the user to evaluate the consequences of the ambiguous identifications on the relative identification errors and hence, on the estimated ad hoc distance threshold. Reg: The user has the possibility to estimate the ad hoc distance threshold on the basis of polynomial regression (rather than linear). For this, check that the package polynom (Venables et al. 2013) is installed.

References

Sonet G, Jordaens K, Nagy ZT, Breman FC, De Meyer M, Backeljau T & Virgilio M", "(2013) Adhoc: an R package to calculate ad hoc distance thresholds for DNA barcoding identification, Zookeys, 365:329-336. http://zookeys.pensoft.net/articles.php?id=3057. Venables B, Hornik K, Maechler M (2013) polynom: a collection of functions to implement a class for univariate polynomial manipulations. R package version 1.3-7. http://cran.r-project.org/web/packages/polynom/index.html. Virgilio M, Jordaens K, Breman FC, Backeljau T, De Meyer M. (2012) Identifying insects with incomplete DNA barcode libraries, African Fruit flies (Diptera: Tephritidae) as a test case. PLoS ONE 7(2) e31581. doi: 10.1371/journal.pone.0031581.

See Also

checkDNAbcd

Examples

Run this code
data(tephdata);
out1<-checkDNAbcd(tephdata);
out2<-adhocTHR(out1);
layout(matrix(1,1,1));
par(font.sub=8);
plot(RE~thres,out2$IDcheck,ylim=c(0,max(c(out2$IDcheck$RE,out2$ErrProb))),xlab=NA,ylab=NA);
title(main="Ad hoc threshold",xlab="Distance", ylab="Relative identification error (RE)");
title(sub=paste("For a RE of",round(out2$ErrProb,4), "use a threshold of", round(out2$THR,4)));
regcoef<-out2$reg$coefficient;
curve(regcoef[1] + regcoef[2]*x + regcoef[3]*x^2 + regcoef[4]*x^3, add=TRUE);
segments(-1,out2$ErrProb,out2$THR,out2$ErrProb);segments(out2$THR,-1,out2$THR,out2$ErrProb);

Run the code above in your browser using DataLab