This script calculates the minor allele frequency for each locus and updates the locus metadata for FreqHomRef, FreqHomSnp, FreqHets and MAF (if it exists). It then uses the updated metadata for MAF to filter loci.
gl.filter.maf(
x,
threshold = 0.01,
by.pop = FALSE,
pop.limit = ceiling(nPop(x)/2),
ind.limit = 10,
recalc = FALSE,
plot.display = TRUE,
plot.theme = theme_dartR(),
plot.colors = NULL,
plot.file = NULL,
plot.dir = NULL,
bins = 25,
verbose = NULL
)
The reduced genlight dataset
Name of the genlight object containing the SNP data [required].
Threshold MAF -- loci with a MAF less than the threshold will be removed. If a value > 1 is provided it will be interpreted as MAC (i.e. the minimum number of times an allele needs to be observed) [default 0.01].
Whether MAF should be calculated by population [default FALSE].
Minimum number of populations in which MAF should be less than the threshold for a locus to be filtered out. Only used if by.pop = TRUE. The default value is half of the populations [default ceiling(nPop(x)/2)].
Minimum number of individuals that a population should contain to calculate MAF. Only used if by.pop=TRUE [default 10].
Recalculate the locus metadata statistics [default FALSE].
If TRUE, histograms of base composition are displayed in the plot window [default TRUE].
Theme for the plot. See Details for options [default theme_dartR()].
List of two color names for the borders and fill of the plots [default c("#2171B5", "#6BAED6")].
Name for the RDS binary file to save (base name only, exclude extension) [default NULL].
Directory in which to save files [default = working directory].
Number of bins to display in histograms [default 25].
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity].
Custodian: Luis Mijangos -- Post to https://groups.google.com/d/forum/dartr
Careful consideration needs to be given to the settings to be used for this
function. When the filter is applied globally (i.e. by.pop=FALSE
) but
the data include multiple population, there is the risk to remove markers
because the allele frequencies is low (at global level) but the allele
frequencies for the same markers may be high within some of the populations
(especially if the per-population sample size is small). Similarly, not
always it is a sensible choice to run this function using by.pop=TRUE
because allele that are rare in a population may be very common in other,
but the (possible) allele frequencies will depend on the sample size within
each population. Where the purpose of filtering for MAF is to remove possible
spurious alleles (i.e. sequencing errors), it is perhaps better to filter
based on the number of times an allele is observed (MAC, Minimum Allele
Count), under the assumption that if an allele is observed > MAC, it is
fairly rare to be an error.
From v2.1 The threshold can take values > 1. In this case, these are interpreted as a threshold for MAC.
Other matched filter:
gl.filter.callrate()
,
gl.filter.hamming()
,
gl.filter.ld()
,
gl.filter.locmetric()
,
gl.filter.monomorphs()
,
gl.filter.overshoot()
,
gl.filter.pa()
,
gl.filter.secondaries()
result <- gl.filter.maf(platypus.gl, threshold = 0.05, verbose = 3)
#result <- gl.filter.maf(platypus.gl, by.pop = TRUE, threshold = 0.05, verbose = 3)
Run the code above in your browser using DataLab