avedist: Average minimal distance between batches

Description

This metric is concerned with the minimal distances between pairs of batches.

Usage

avedist(xba, batch)

Arguments

xba

matrix. The covariate matrix, raw or after batch effect adjustment. observations in rows, variables in columns.

batch

factor. Batch variable. Each factor level (or 'category') corresponds to one of the batches. For example, if there are four batches, this variable would have four factor levels and observations with the same factor level would belong to the same batch.

Value

Value of the metric

Details

For two batches j and j* (see next paragraph for the case with more batches): 1) for each observation in batch j the euclidean distance to the nearest observation in batch j* is calculated; 2) the roles of j and j* are switched and 1) is re-performed; 3) the average is taken over all n_j + n_j* minimal distances.

For more than two batches: 1) for all possible pairs of batches: calculate the metric as described above; 2) calculate the weighted average of the values in 1) with weights proportional to the sum of the sample sizes in the two respective batches.

The variables are standardized before the calculation to make the metric independent of scale.

References

Lazar, C., Meganck, S., Taminau, J., Steenhoff, D., Coletta, A., Molter,C., Weiss-Sol<U+00ED>s, D. Y., Duque, R., Bersini, H., Now<U+00E9>, A. (2012). Batch effect removal methods for microarray gene expression data integration: a survey. Briefings in Bioinformatics 14(4):469-490, <10.1093/bib/bbs037>.

Hornung, R., Boulesteix, A.-L., Causeur, D. (2016). Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment. BMC Bioinformatics 17:27, <10.1186/s12859-015-0870-z>.

Examples

Run this code

# NOT RUN {
data(autism)

avedist(xba=X, batch=batch)
# }

Run the code above in your browser using DataLab