haplotype: Haplotype Extraction and Frequencies

Description

haplotype extracts the haplotypes from a set of DNA sequences. The result can be plotted with the appropriate function.

Usage

haplotype(x, ...)
# S3 method for DNAbin
haplotype(x, labels = NULL, strict = FALSE,
                  trailingGapsAsN = TRUE, ...)
# S3 method for character
haplotype(x, labels = NULL, ...)
# S3 method for numeric
haplotype(x, labels = NULL, ...)
# S3 method for haplotype
plot(x, xlab = "Haplotype", ylab = "Number", ...)
# S3 method for haplotype
print(x, ...)
# S3 method for haplotype
summary(object, ...)
# S3 method for haplotype
sort(x,
     decreasing = ifelse(what == "frequencies", TRUE, FALSE),
     what = "frequencies", ...)
# S3 method for haplotype
[(x, ...)

Value

haplotype returns an object of class c("haplotype", "DNAbin") which is an object of class "DNAbin" with two additional attributes: "index" identifying the index of each observation that share the same haplotype, and "from" giving the name of the original data.

sort returns an object of the same class respecting its attributes.

Arguments

x: a set of DNA sequences (as an object of class "DNAbin"), or an object of class "haplotype".
object: an object of class "haplotype".
labels: a vector of character strings used as names for the rows of the returned object. By default, Roman numerals are given.
strict: a logical value; if TRUE, ambiguities and gaps in the sequences are ignored and treated as separate characters.
trailingGapsAsN: a logical value; if TRUE (the default), the leading and trailing alignment gaps are considered as unknown bases (i.e., N). This option has no effect if strict = TRUE.
xlab, ylab: labels for the x- and x-axes.
...: further arguments passed to barplot (unused in print and sort).
decreasing: a logical value specifying in which order to sort the haplotypes; by default this depends on the value of what.
what: a character specifying on what feature the haplotypes should be sorted: this must be "frequencies" or "labels", or an unambiguous abbreviation of these.

Author

Emmanuel Paradis

Details

The way ambiguities in the sequences are taken into account is explained in a post to r-sig-phylo (see the examples below):

https://www.mail-archive.com/r-sig-phylo@r-project.org/msg05541.html

The sort method sorts the haplotypes in decreasing frequencies (the default) or in alphabetical order of their labels (if what = "labels"). Note that if these labels are Roman numerals (as assigned by haplotype), their alphabetical order may not be their numerical one (e.g., IX is alphabetically before VIII).

From pegas 0.7, haplotype extracts haplotypes taking into account base ambiguities (see Note below).

Examples

Run this code

## generate some artificial data from 'woodmouse':
data(woodmouse)
x <- woodmouse[sample(15, size = 110, replace = TRUE), ]
(h <- haplotype(x))
## the indices of the individuals belonging to the 1st haplotype:
attr(h, "index")[[1]]
plot(sort(h))
## get the frequencies in a named vector:
setNames(lengths(attr(h, "index")), labels(h))

## data posted by Hirra Farooq on r-sig-phylo (see link above):
cat(">[A]\nCCCGATTTTATATCAACATTTATTT------",
    ">[D]\nCCCGATTTT----------------------",
    ">[B]\nCCCGATTTTATATCAACATTTATTT------",
    ">[C]\nCCCGATTTTATATCACCATTTATTTTGATTT",
    file = "x.fas", sep = "\n")
x <- read.dna("x.fas", "f")
unlink("x.fas")

## show the sequences and the distances:
alview(x)
dist.dna(x, "N", p = TRUE)

## by default there are 3 haplotypes with a warning about ambiguity:
haplotype(x)

## the same 3 haplotypes without warning:
haplotype(x, strict = TRUE)

## if we remove the last sequence there is, by default, a single haplotype:
haplotype(x[-4, ])

## to get two haplotypes separately as with the complete data:
haplotype(x[-4, ], strict = TRUE)

## a simpler example:
y <- as.DNAbin(matrix(c("A", "A", "A", "A", "R", "-"), 3))
haplotype(y) # 1 haplotype
haplotype(y, strict = TRUE) # 3 haplotypes
haplotype(y, trailingGapsAsN = FALSE) # 2 haplotypes

## a tricky example with 4 sequences and 1 site:
z <- as.DNAbin(matrix(c("Y", "A", "R", "N"), 4))
alview(z, showpos = FALSE)

## a single haplotype is identified:
haplotype(z)
## 'Y' has zero-distance with (and only with) 'N', so they are pooled
## together; at a later iteration of this pooling step, 'N' has
## zero-distance with 'R' (and ultimately with 'A') so they are pooled

## if the sequences are ordered differently, 'Y' and 'A' are separated:
haplotype(z[c(4, 1:3), ])

Run the code above in your browser using DataLab