gl2snapper: Converts a genlight object to nexus format suitable for phylogenetic analysis by Snapper (via BEAUti) @family linker

Description

Produces a nexus file contains the SNP calls and relevant PAUP command lines suitable for for the software package BEAUti.

Usage

gl2snapper(
  x,
  outfile = "snapper.nex",
  rm.autapomorphies = FALSE,
  nloc = NULL,
  outpath = NULL,
  verbose = NULL
)

Value

returns no value (i.e. NULL)

Arguments

x: Name of the genlight object containing the SNP data [required].
outfile: Name of the output file (including extension) [default "snapper.nex"].
rm.autapomorphies: Prune the loci by removing autapomorphies (not phylogentically informative), that is, SNP polymorphisms limited to only one population [default TRUE].
nloc: Number of loci to subsample to bring down computational time [default NULL]
outpath: Path where to save the output file [default global working directory or if not specified, tempdir()].
verbose: Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity].

Author

Custodian: Arthur Georges (Post to https://groups.google.com/d/forum/dartr)

Details

Snapper is a phylogenetic approach implemented in Beauti/BEAST that can handle larger SNP datasets than can SNAPP. This script produces a nexus file for Beauti which allows options to be set and creates the xml file for BEAST.Although improved over SNAPP in terms of computational efficiency, Snapper remains constrained by computational times, even when implemented on high performance computers with parallel processing. Computation time of Snapper is not particularly sensitive to the number of individuals in each terminal taxon (= population), but is impacted by the number of populations and the numbers of loci. Particular attention needs to be directed at amalgamating populations where their joint taxonomic identity is without question and reducing the number of SNP loci by prudent filtering. Removing monomorphic loci, increasing the reliability of loci (read depth, reproducibility), minimizing missing data (call rate), removing multiple SNPs in a single sequence tag (secondaries) are all good options. You may wish also to remove SNP loci that are polymorphic within only a single population (autapomorphies) are all options for reducing the number of loci (rm.autapomorphies=TRUE)If computational time is still an issue (say requiring one month for a single run), following strategy is recommended. First, subsample the loci to say 100 and test the process to ensure there are no syntax or other issues (nloc=100). You do not want to wait several days or weeks running the full dataset to discover a simple syntax error or incorrectly specified parameter. Second consider running BEAST on a platform that allows multi-threading as this will dramatically reduce compute time. Note that adding threads does not always improve computational time. The optimal number of threads depends on the particular analysis. This means you have to experiment to find out how many threads give the best performance foa a computer cycle.Also, there is an overheads cost in using many threads. Instead, you could run independent snapper analyses and combine resulting log and tree files. For example, if the optimal number of threads is 8 (adding more threads reduces speed), but 8 threads gives marginal improvement over 4 threads, you can run 2 chains with 4 threads each instead and (after getting through burn-in) then combine results and get a better result than running a single chain at 8 threads. A bonus benefit from running multiple chains is that you can verify that the MCMC ends up with the same posterior distribution each time.If computational time is still an issue, run the #'analysis on a series of subsamples of loci (say nloc=300) to see if a consistent topology is obtained, then adopt that topology as the final result.Note that there is a cost to manipulating your data to achieve reasonable computation times. Omission of sequence tags that are invariant during the SNP calling process, removal of monomorphic loci generated during taxon selection, and removing autapomorphic loci will all affect branch lengths, perhaps differentially, and so compromise branch lengths and estimates of divergence times. Fortunately, the topology should be little affected.Finally, the analysis relies on certain assumptions. First is that the structure is one of a bifurcating tree and not a network. One needs to assign individuals to populations in advance of the analysis, confident that they are discrete entities and free of horizontal transfer. A second assumption is that the loci scored for SNPs are assorting independently. This is probably a reasonable assumption for SNPs derived from sparse representational sampling (e.g. DArT), but if dense SNP arrays are being used, then some form of thinning will be required. Of course, multiple SNPs on a single sequence tag will be linked, so filtering all but one SNP per sequence tag is required (gl.filter.secondaries).The workflow is

"1" Execute gl2snapper()
"2" Install beast2
"3" Run beauti in the beast2 bin
"4" Set the template to snapper [File | Template | Snapper]
"5" Load the nexus file produced by gl2snapper()
"6" Select and set the parameters you consider appropriate
"7" Save the xml file [File | Save As]
"8" Run beast
"9" Load the xml file and execute
"10" When beast is finished, examine the diagnostics with Tracer
"11" Visualize the resultant trees using DensiTree and FigTree.

If using the command line to run beast, the command is beast -threads myxmlfile. Progress can be monitored with awk (awk '{print $1, $2}' snapper.log |tail). When beast is finished, transfer the log and tree files to a windows platform and use Tracer, DensiTree and FigTree as above.gl2snapper does not work with SilicoDArT data.

References

Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A. and RoyChoudhury, A. (2012). Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution 29:1917-1932.

Rambaut A, Drummond AJ, Xie D, Baele G and Suchard MA (2018) Posterior summarisation in Bayesian phylogenetics using Tracer 1.7. Systematic Biology. syy032. doi:10.1093/sysbio/syy032

Examples

Run this code

x <- gl.filter.monomorphs(testset.gl)
gl2snapper(x, outfile="test.nex", outpath=tempdir())

Run the code above in your browser using DataLab