rho(sequence, wordsize = 2, alphabet = s2c("acgt"))
zscore(sequence, simulations = NULL, modele, exact = FALSE, alphabet = s2c("acgt"), ... )
NULL
, analytical solution is computed
when available (models base
and codon
). Otherwise, it
should be the number of permutations for the z-score computationpermutation
function.rho
statistic, as presented in Karlin S., Cardon LR. (1994), can
be computed on each of the 16 dinucleotides. It is the frequence of
dinucleotide xy divided by the product of frequencies of
nucleotide x and nucleotide y. It is equal to 1.00 when
dinucleotide xy is formed by pure chance, and it is superior
(respectively inferior) to 1.00 when dinucleotide xy is over-
(respectively under-) represented. The zscore
statistic, as presented in Palmeira, L., Gu�guen, L.
and Lobry JR. (2006). The statistic is the normalization of the
rho
statistic by its expectation and variance according to a
given random sequence generation model, and follows the
standard normal distribution. This statistic can be computed
with several models (cf. permutation
for the description
of each of the models). We provide analytical calculus for two of
them: the base
permutations model and the codon
permutations model.
The base
model allows for random sequence generation by
shuffling (with/without replacement) of all bases in the sequence.
Analytical computations are available for this model: either as an
approximation for large sequences (cf. Palmeira, L., Gu�guen, L.
and Lobry JR. (2006)), either as the exact analytical formulae
(cf. Schbath, S. (1995)).
The position
model allows for random sequence generation
by shuffling (with/without replacement) of bases within their
position in the codon (bases in position I, II or III stay in
position I, II or III in the new sequence.
The codon
model allows for random sequence generation by
shuffling (with/without replacement) of codons. Analytical
computation is available for this model (Gautier, C., Gouy, M. and
Louail, S. (1985)).
The syncodon
model allows for random sequence generation
by shuffling (with/without replacement) of synonymous codons.
Schbath, S. (1995) �tude asymptotique du nombre d'occurrences d'un mot dans une cha�ne de Markov et application � la recherche de mots de fr�quence exceptionnelle dans les s�quences d'ADN. Th�se de l'Universit� Ren� Descartes, Paris V
Palmeira, L., Gu�guen, L. and Lobry, J.R. (2006) UV-targeted dinucleotides
are not depleted in light-exposed Prokaryotic genomes.
Molecular Biology and Evolution,
23:2214-2219.
citation("seqinr")
permutation
sequence <- sample(x = s2c("acgt"), size = 6000, replace = TRUE)
rho(sequence)
zscore(sequence, modele = "base")
zscore(sequence, modele = "base", exact = TRUE)
zscore(sequence, modele = "codon")
zscore(sequence, simulations = 1000, modele = "syncodon")
Run the code above in your browser using DataLab