This package phyclust (Chen 2011) implements an novel approach combining model-based clusterings and phylogenetics to classify DNA sequences and SNP sequences. Based on evolution models, sequences are assumed to follow a mutation process/distribution clouding around an unknown center ancestor. Based on Continuous Time Markov Chain Theory, mixture distributions are established to model/classify subpopulations or population structures.
The kernel part of the package are implemented in C. EM algorithms are performed to find the maximum likelihood estimators. Initialization methods for EM algorithms are also established. Several evolution models are also developed.
ms
(Hudson 2002) and seq-gen
(Rambaut and Grassly 1997)
are two useful programs to generate coalescent trees and sequences, and
both are merged into phyclust. baseml
of PAML (Yang 1997, 2007)
is also ported into phyclust and it is a program to find a phylogenetic
tree by maximizing likelihood. Hap-Clustering method (Tzeng 2005) for
haplotype grouping is also incorporated into phyclust.
Type help(package = phyclust)
to see a list of major
functions for which further documentations are available. The on-line
detail instructions are also available and the link is given below in the
‘References’ section.
Some C and R functions and R classes of the ape package are also required and modified in phyclust.
Wei-Chen Chen wccsnow@gmail.com
The main function is phyclust
controlled by an object
.EMC
generated by a function .EMControl
,
and find.best
can find the best solution by repeating
phyclust
with different initializations.
ms
and seqgen
can generate trees and sequences
based on varied conditions, and they can jointly perform simulations.
paml.baseml
can estimate trees based on sequences.
haplo.post.prob
is a modified version of Tzeng's method
for haplotype grouping which uses a evolution approach to group SNP
sequences.
Some tool functions of the ape package are utilized in this package to perform trees in plots, check object types, and read sequence data.
Phylogenetic Clustering Website: https://snoweye.github.io/phyclust/
Chen, W.-C. (2011) “Overlapping codon model, phylogenetic clustering, and alternative partial expectation conditional maximization algorithm”, Ph.D. Diss., Iowa Stat University.
Hudson, R.R. (2002) “Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation”, Bioinformatics, 18, 337-338. http://home.uchicago.edu/~rhudson1/source.html
Rambaut, A. and Grassly, N.C. (1997) “Seq-Gen: An Application for the Monte Carlo Simulation of DNA Sequence Evolution along Phylogenetic Trees”, Computer Applications In The Biosciences, 13:3, 235-238. http://tree.bio.ed.ac.uk/software/seqgen/
Yang, Z. (1997) “PAML: a program package for phylogenetic analysis by maximum likelihood”, Computer Applications in BioSciences, 13, 555-556. http://abacus.gene.ucl.ac.uk/software/paml.html
Yang, Z. (2007) “PAML 4: a program package for phylogenetic analysis by maximum likelihood”, Molecular Biology and Evolution, 24, 1586-1591.
Tzeng, J.Y. (2005) “Evolutionary-Based Grouping of Haplotypes in Association Analysis”, Genetics Epidemiology, 28, 220-231. https://www4.stat.ncsu.edu/~jytzeng/software.php
Paradis E., Claude J., and Strimmer K. (2004) “APE: analyses of phylogenetics and evolution in R language”, Bioinformatics, 20, 289-290. http://ape-package.ird.fr/
phyclust
,
.EMC
,
.EMControl
,
find.best
.
if (FALSE) {
library(phyclust, quiet = TRUE)
demo(package = "phyclust")
demo("ex_trees", package = "phyclust")
}
Run the code above in your browser using DataLab