site.spectrum
computes the (un)folded site frequency spectrum
of a set of aligned DNA sequences or SNPs.
site.spectrum(x, ...)
# S3 method for DNAbin
site.spectrum(x, folded = TRUE, outgroup = 1, ...)
# S3 method for loci
site.spectrum(x, folded = TRUE, ancestral = NULL, ...)
# S3 method for spectrum
plot(x, col = "red", main = NULL, ...)
site.spectrum
returns an object of class "spectrum"
which is a vector of integers (some values may be equal to zero) with
the attributes "sample.size"
and "folded"
(a logical
value) indicating which version of the spectrum has been computed.
a set of DNA sequences (as an object of class
"DNAbin"
), or an object of class "spectrum"
.
a logical specifying whether to compute the folded site
frequency spectrum (the default), or the unfolded spectrum if
folded = FALSE
.
a single integer value giving which sequence is
ancestral; ignored if folded = TRUE
.
a vector of ancestral alleles (required if
folded = FALSE
), typically from an output of
VCFloci
.
the colour of the barplot (red by default).
a character string for the title of the plot; a generic
title is given by default (use main = ""
to have no title).
further arguments passed to
barplot
, or to other mehods.
Emmanuel Paradis
Under the infinite sites model of mutation, mutations occur on distinct sites, so every segregating (polymorphic) site defines a partition of the \(n\) sequences (see Wakeley, 2009). The site frequency spectrum is a series of values where the \(i\)th element is the number of segregating sites defining a partition of \(i\) and \(n - i\) sequences. The unfolded version requires to define an ancestral state with an external (outgroup) sequence, so \(i\) varies between 1 and \(n - 1\). If no ancestral state can be defined, the folded version is computed, so \(i\) varies between 1 and \(n/2\) or \((n - 1)/2\), for \(n\) even or odd, respectively.
If folded = TRUE
, sites with more than two states are ignored
and a warning is returned giving how many were found.
If folded = FALSE
, sites with an ambiguous state at the
external sequence are ignored and a warning is returned giving how
many were found. Note that it is not checked if some sites have more
than two states.
If x
is an object of class "loci"
, the loci which are
not biallelic (e.g., SNPs) are dropped with a warning.
Wakeley, J. (2009) Coalescent Theory: An Introduction. Greenwood Village, CO: Roberts and Company Publishers.
DNAbin
for manipulation of DNA sequences in R,
haplotype
require(ape)
data(woodmouse)
(sp <- site.spectrum(woodmouse))
plot(sp)
Run the code above in your browser using DataLab