Identify replicated individuals
gl.report.replicates(
x,
loc_threshold = 100,
perc_geno = 0.95,
plot.out = TRUE,
plot_theme = theme_dartR(),
plot_colors = c("#2171B5", "#6BAED6"),
bins = 100,
verbose = NULL
)
A list with three elements:
table.rep: A dataframe with pairwise results of percentage of same genotypes between two individuals, the number of loci used in the comparison and the missing data for each individual.
ind.list.drop: A vector of replicated individuals to be dropped. Replicated individual with the least missing data is reported.
ind.list.rep: A list of of each individual that has replicates in the dataset, the name of the replicates and the percentage of the same genotype.
Name of the genlight object containing the SNP data [required].
Minimum number of loci required to asses that two individuals are replicates [default 100].
Mimimum percentage of genotypes in which two individuals should be the same [default 0.95].
Specify if plot is to be produced [default TRUE].
User specified theme [default theme_dartR()].
Vector with two color names for the borders and fill [default c("#2171B5", "#6BAED6")].
Number of bins to display in histograms [default 100].
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity].
Custodian: Luis Mijangos -- Post to https://groups.google.com/d/forum/dartr
This function uses an C++ implementation, so package Rcpp needs to be installed and it is therefore fast (once it has compiled the function after the first run).
Ideally, in a large dataset with related and unrelated individuals and several replicated individuals, such as in a capture/mark/recapture study, the first histogram should have four "peaks". The first peak should represent unrelated individuals, the second peak should correspond to second-degree relationships (such as cousins), the third peak should represent first-degree relationships (like parent/offspring and full siblings), and the fourth peak should represent replicated individuals.
In order to ensure that replicated individuals are properly identified, it's important to have a clear separation between the third and fourth peaks in the second histogram. This means that there should be bins with zero counts between these two peaks.
Other report functions:
gl.report.pa()
# \donttest{
res_rep <- gl.report.replicates(platypus.gl, loc_threshold = 500,
perc_geno = 0.85)
# }
Run the code above in your browser using DataLab