polyfreqs
An R package for Bayesian population genomics in autopolyploids
polyfreqs is an R package for the estimation of biallelic SNP frequencies, genotypes and heterozygosity in autopolyploid taxa using high throughput sequencing data. It should work for diploids as well, but does not accomodate data sets of mixed ploidy.
polyfreqs does accept missing data: code them as 0
in the total read count matrix.
NEW: polyfreqs now has a Google Groups page. Please feel free to join the group and post any questions that you may have about the software. [Google Groups link]
Dependencies
polyfreqs uses C++ code to implement its Gibbs sampling algorithm which will usually require the installation of additional software (depending on the operating system [OS] being used).
Windows users will need to install Rtools.
MacOSX users will need to install the Xcode Command Line Tools.
Linux users will need an up-to-date version of the GNU Compiler Collection (gcc) and the r-base-dev package. polyfreqs relies on the R package Rcpp which is a good place to start too for figuring what you will need. Note that Rcpp also requires the compilation of C++ code so make sure that the necessary compilers are installed appropriately for your OS. You can install Rcpp directly from CRAN in the usual way using the install.packages()
command:
install.packages("Rcpp")
Installation
polyfreqs v1.0.0 is now on CRAN: link.
You can now install it like you would any other R package:
install.packages("polyfreqs")
Installing the latest developmental release of polyfreqs can be done using the devtools package and the install_github()
command.
Install devtools using install.packages("devtools")
. polyfreqs can then be installed as follows:
devtools::install_github("pblischak/polyfreqs")
Documentation
Example code and tutorials for running polyfreqs can be found in the vignette. For more details on the model underlying polyfreqs please see the associated paper in Molecular Ecology Resources: Blischak et al. The Supplemental Material also has a walk through for analyzing a data set collected for autotetraploid potato (Solanum tuberosum).
Release notes
v1.0.2 -- Small patch that updated code for sampling genotypes during the MCMC that was giving underflow errors when total read counts are high (~1000x coverage).
v1.0.1 -- Removed dependency on the RcppArmadillo
sample()
function by coding our own version (nonunif_int()
in the sample_g.cpp source file). The Gibbs sampler should run a bit faster now.v1.0.0 -- First release. Now available on CRAN.