gl.report.replicates: Identify replicated individuals

Description

Identify replicated individuals

Usage

gl.report.replicates(
  x,
  loc_threshold = 100,
  perc_geno = 0.95,
  plot.out = TRUE,
  plot_theme = theme_dartR(),
  plot_colors = c("#2171B5", "#6BAED6"),
  bins = 100,
  verbose = NULL
)

Value

A list with three elements:

table.rep: A dataframe with pairwise results of percentage of same genotypes between two individuals, the number of loci used in the comparison and the missing data for each individual.
ind.list.drop: A vector of replicated individuals to be dropped. Replicated individual with the least missing data is reported.
ind.list.rep: A list of of each individual that has replicates in the dataset, the name of the replicates and the percentage of the same genotype.

Arguments

x: Name of the genlight object containing the SNP data [required].
loc_threshold: Minimum number of loci required to asses that two individuals are replicates [default 100].
perc_geno: Mimimum percentage of genotypes in which two individuals should be the same [default 0.95].
plot.out: Specify if plot is to be produced [default TRUE].
plot_theme: User specified theme [default theme_dartR()].
plot_colors: Vector with two color names for the borders and fill [default c("#2171B5", "#6BAED6")].
bins: Number of bins to display in histograms [default 100].
verbose: Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity].

Author

Custodian: Luis Mijangos -- Post to https://groups.google.com/d/forum/dartr

Details

This function uses an C++ implementation, so package Rcpp needs to be installed and it is therefore fast (once it has compiled the function after the first run).

Ideally, in a large dataset with related and unrelated individuals and several replicated individuals, such as in a capture/mark/recapture study, the first histogram should have four "peaks". The first peak should represent unrelated individuals, the second peak should correspond to second-degree relationships (such as cousins), the third peak should represent first-degree relationships (like parent/offspring and full siblings), and the fourth peak should represent replicated individuals.

In order to ensure that replicated individuals are properly identified, it's important to have a clear separation between the third and fourth peaks in the second histogram. This means that there should be bins with zero counts between these two peaks.

Examples

Run this code

# \donttest{
res_rep <- gl.report.replicates(platypus.gl, loc_threshold = 500, 
perc_geno = 0.85)
# }

Run the code above in your browser using DataLab