Learn R Programming

ape (version 5.5)

base.freq: Base frequencies from DNA Sequences

Description

base.freq computes the frequencies (absolute or relative) of the four DNA bases (adenine, cytosine, guanine, and thymidine) from a sample of sequences.

GC.content computes the proportion of G+C (using the previous function). All missing or unknown sites are ignored.

Ftab computes the contingency table with the absolute frequencies of the DNA bases from a pair of sequences.

Usage

base.freq(x, freq = FALSE, all = FALSE)
GC.content(x)
Ftab(x, y = NULL)

Arguments

x

a vector, a matrix, or a list which contains the DNA sequences.

y

a vector with a single DNA sequence.

freq

a logical specifying whether to return the proportions (the default) or the absolute frequencies (counts).

all

a logical; by default only the counts of A, C, G, and T are returned. If all = TRUE, all counts of bases, ambiguous codes, missing data, and alignment gaps are returned.

Value

A numeric vector with names c("a", "c", "g", "t") (and possibly "r", "m", ..., a single numeric value, or a four by four matrix with similar dimnames.

Details

The base frequencies are computed over all sequences in the sample.

For Ftab, if the argument y is given then both x and y are coerced as vectors and must be of equal length. If y is not given, x must be a matrix or a list and only the two first sequences are used.

See Also

seg.sites, nuc.div (in pegas), DNAbin

Examples

Run this code
# NOT RUN {
data(woodmouse)
base.freq(woodmouse)
base.freq(woodmouse, TRUE)
base.freq(woodmouse, TRUE, TRUE)
GC.content(woodmouse)
Ftab(woodmouse)
Ftab(woodmouse[1, ], woodmouse[2, ]) # same than above
Ftab(woodmouse[14:15, ]) # between the last two
# }

Run the code above in your browser using DataLab