Learn R Programming

dartR.base (version 1.0.5)

gl.report.replicates: Identify replicated individuals

Description

Identify replicated individuals

Usage

gl.report.replicates(
  x,
  loc_threshold = 100,
  perc_geno = 0.95,
  plot.out = TRUE,
  plot_theme = theme_dartR(),
  plot_colors = c("#2171B5", "#6BAED6"),
  bins = 100,
  verbose = NULL
)

Value

A list with three elements:

  • table.rep: A dataframe with pairwise results of percentage of same genotypes between two individuals, the number of loci used in the comparison and the missing data for each individual.

  • ind.list.drop: A vector of replicated individuals to be dropped. Replicated individual with the least missing data is reported.

  • ind.list.rep: A list of of each individual that has replicates in the dataset, the name of the replicates and the percentage of the same genotype.

Arguments

x

Name of the genlight object containing the SNP data [required].

loc_threshold

Minimum number of loci required to asses that two individuals are replicates [default 100].

perc_geno

Mimimum percentage of genotypes in which two individuals should be the same [default 0.95].

plot.out

Specify if plot is to be produced [default TRUE].

plot_theme

User specified theme [default theme_dartR()].

plot_colors

Vector with two color names for the borders and fill [default c("#2171B5", "#6BAED6")].

bins

Number of bins to display in histograms [default 100].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity].

Author

Custodian: Luis Mijangos -- Post to https://groups.google.com/d/forum/dartr

Details

This function uses an C++ implementation, so package Rcpp needs to be installed and it is therefore fast (once it has compiled the function after the first run).

Ideally, in a large dataset with related and unrelated individuals and several replicated individuals, such as in a capture/mark/recapture study, the first histogram should have four "peaks". The first peak should represent unrelated individuals, the second peak should correspond to second-degree relationships (such as cousins), the third peak should represent first-degree relationships (like parent/offspring and full siblings), and the fourth peak should represent replicated individuals.

In order to ensure that replicated individuals are properly identified, it's important to have a clear separation between the third and fourth peaks in the second histogram. This means that there should be bins with zero counts between these two peaks.

See Also

Other report functions: gl.report.pa()

Examples

Run this code
# \donttest{
res_rep <- gl.report.replicates(platypus.gl, loc_threshold = 500, 
perc_geno = 0.85)
# }

Run the code above in your browser using DataLab