These two functions compute two different types of statistics for the measure of statistical dinculeotide over- and under-representation : the rho statistic, and the z-score, each computed for all 16 dinucleotides.
rho(sequence, wordsize = 2, alphabet = s2c("acgt"))
zscore(sequence, simulations = NULL, modele, exact = FALSE, alphabet = s2c("acgt"), ... )a vector of single characters.
an integer giving the size of word (n-mer) to consider.
If NULL, analytical solution is computed
    when available (models base and codon). Otherwise, it
    should be the number of permutations for the z-score computation
A string of characters describing the model chosen for the random generation
Whether exact analytical calculation or an approximation should be used
A vector of single characters.
Optional parameters for specific model permutations are
    passed on to permutation function.
a table containing the computed statistic for each dinucleotide
The rho statistic, as presented in Karlin S., Cardon LR. (1994), can
  be computed on each of the 16 dinucleotides. It is the frequence of
  dinucleotide xy divided by the product of frequencies of
  nucleotide x and nucleotide y. It is equal to 1.00 when
  dinucleotide xy is formed by pure chance, and it is superior
  (respectively inferior) to 1.00 when dinucleotide xy is over-
  (respectively under-) represented. Note that if you want to reproduce
  Karlin's results you have to compute the statistic from the sequence 
  concatenated with its inverted complement that is with something 
  like rho(c(myseq, rev(comp(mysed)))).
The zscore statistic, as presented in Palmeira, L., Gu<U+00E9>guen, L.
  and Lobry JR. (2006). The statistic is the normalization of the
  rho statistic by its expectation and variance according to a
  given random sequence generation model, and follows the
  standard normal distribution. This statistic can be computed
  with several models (cf. permutation for the description
  of each of the models). We provide analytical calculus for two of
  them: the base permutations model and the  codon
  permutations model.
The base model allows for random sequence generation by
  shuffling (with/without replacement) of all bases in the sequence.
  Analytical computations are available for this model: either as an 
  approximation for large sequences (cf. Palmeira, L., Gu<U+00E9>guen, L.
  and Lobry JR. (2006)), either as the exact analytical formulae
  (cf. Schbath, S. (1995)).
The position model allows for random sequence generation
  by shuffling (with/without replacement) of bases within their
  position in the codon (bases in position I, II or III stay in
  position I, II or III in the new sequence.
The codon model allows for random sequence generation by
  shuffling (with/without replacement) of codons. Analytical
  computation is available for this model (Gautier, C., Gouy, M. and
  Louail, S. (1985)).
The syncodon model allows for random sequence generation
  by shuffling (with/without replacement) of synonymous codons.
Gautier, C., Gouy, M. and Louail, S. (1985) Non-parametric statistics for nucleic acid sequence study. Biochimie, 67:449-453.
Karlin S. and Cardon LR. (1994) Computational DNA sequence analysis. Annu Rev Microbiol, 48:619-654.
Schbath, S. (1995) <U+00C9>tude asymptotique du nombre d'occurrences d'un mot dans une cha<U+00EE>ne de Markov et application <U+00E0> la recherche de mots de fr<U+00E9>quence exceptionnelle dans les s<U+00E9>quences d'ADN. Th<U+00E8>se de l'Universit<U+00E9> Ren<U+00E9> Descartes, Paris V
Palmeira, L., Gu<U+00E9>guen, L. and Lobry, J.R. (2006) UV-targeted dinucleotides are not depleted in light-exposed Prokaryotic genomes. Molecular Biology and Evolution, 23:2214-2219. https://academic.oup.com/mbe/article/23/11/2214/1335460
citation("seqinr")
# NOT RUN {
sequence <- sample(x = s2c("acgt"), size = 6000, replace = TRUE)
rho(sequence)
zscore(sequence, modele = "base")
zscore(sequence, modele = "base", exact = TRUE)
zscore(sequence, modele = "codon")
zscore(sequence, simulations = 1000, modele = "syncodon")
# }
Run the code above in your browser using DataLab