phastCons: Produce conservation scores and identify conserved elements, given a multiple alignment and a phylo-HMM.

Description

A phylo-HMM consisting of two states is assumed: a "conserved" state and a "non-conserved" state. If two phylogenetic models are given, the first is the conserved state, and the second is the non-conserved state. If only one model is given, then this is used as the non-conserved state, and the conserved state is obtained by multiplying the branch lengths by the parameter rho.

Usage

phastCons(msa, mod, rho = 0.3, target.coverage = 0.05,
  expected.length = 10, transitions = NULL, estimate.rho = FALSE,
  estimate.expected.length = FALSE, estimate.transitions = FALSE,
  estimate.trees = FALSE, viterbi = TRUE, gc = NULL, nrates = NULL,
  ref.idx = 1, quiet = FALSE)

Arguments

msa

An object of type msa representing the multiple alignment to be scored.

mod

Either a single object of type tm, or a list containing two tm objects. If two objects are given, they represent the conserved and non-conserved phylogenetic models. If one is given, then this represents the non-conserved model, and the conserved model is obtained by scaling the branches by a parameter rho.

rho

Set the scale (overall evolutionary rate) of the model for the conserved state to be <rho> times that of the model for the non-conserved state ( 0 < rho < 1). If used with estimate.trees or estimate.rho, the specified value will be used for initialization only, and rho will be estimated. This argument is ignored if mod contains two tree model objects.

target.coverage

A single numeric value, representing the fraction of sites in conserved elements. This argument sets a prior expectation rather than a posterior and assumes stationarity of the state-transition process. Adding this constraint causes the ratio of between-state transitions to be fixed at (1-gamma)/gamma (where gamma is the target.coverage value).

expected.length

A single numeric value, representing the parameter omega, which describes the expected length of conserved elements. This is an alternative to the transitions argument. If provided with target.coverage, than transition rates are fully determined, otherwise the target-coverage parameter will be estimated by maximum likelihood.

transitions

(Alternative to target.coverage and expected.length; ignored if either of these are specified). A numeric vector of length one or two, representing the transition probabilities for the two-state HMM. The first value represents mu, the transition rate from the conserved to the non-conserved state, and the second value is nu, the rate from non-conserved to conserved. If only one value is provided then mu=nu. The rate of self-transitions are then 1-mu and 1-nu, and the expected lengths of conserved and non-conserved elements are 1/mu and 1/nu, respectively. If estimate.transition is TRUE, the provided values will be used for initialization.

estimate.rho

A logical value. If TRUE, Estimate the parameter rho (as described above), using maximum likelihood. Estimated value is reported in return list. This use is discouraged (see note below).

estimate.expected.length

A logical value. If TRUE, estimate the expected length of conserved elements by maximum likelihood, and use the target.coverage parameter for initialization. Setting this parameter to TRUE is discouraged (see note below).

estimate.transitions

A logical value. If TRUE, estimate the transition rates between conserved and non-conserved states by maximum likelihood. The parameter transitions is then used for initialization. This argument is ignored if estimate.expected.length==TRUE. Setting this argument to TRUE is discouraged (see note below).

estimate.trees

A logical value. If TRUE, estimate free parameters of tree models for conserved and non-conserved state. Setting this argument to TRUE is discouraged (see note below).

viterbi

A logical value. If TRUE, produce discrete elements using the Viterbi algorithm.

A single numeric value given the fraction of bases that are G or C, for optional use with estimate.trees or estimate.rho. This overrides the default behavior of estimating the base composition empirically from the data.

nrates

An integer vector of length one or two, for optional use with estimate.trees and a discrete-gamma model. Assume the specified number of rate categories, rather than the number given in the input tree model(s). If two values are given they apply to the conserved and nonconserved models, respectively.

ref.idx

An integer value. Use the coordinate frame of the given sequence. Default is 1, indicating the first sequence in the alignment. A value of 0 indicates the coordinate frame of the entire alignment.

quiet

If TRUE, suppress printing of progress information.

Value

A list containing parameter estimates. The list may have any of the following elements, depending on the arguments:

transition.rates

A numeric vector of length two giving the rates from the conserved to the non-conserved state, and from the non-conserved to the conserved state.

rho

The relative evolutionary rate of the conserved state compared to the non-conserved state.

tree.models

Tree model objects describing the evolutionary process in the conserved and non-conserved states.

most.conserved

An object of type feat which describes conserved elements detected by the Viterbi algorithm.

post.prob.wig

A data frame giving a coordinate and score for individual bases in the alignment

likelihood

The likelihood of the data under the estimated model.

Examples

Run this code

# NOT RUN {
exampleArchive <- system.file("extdata", "examples.zip", package="rphast")
files <- c("ENr334-100k.fa", "rev.mod")
unzip(exampleArchive, files)
mod <- read.tm("rev.mod")
msa <- read.msa("ENr334-100k.fa")
rv <- phastCons(msa, mod)
names(rv)
rv2 <- phastCons(msa, mod, estimate.trees=TRUE)
names(rv2)
rv2$tree.models
unlink(files)
# }

Run the code above in your browser using DataLab