subset.haplotype: Subsetting and Filtering Haplotypes

Description

This function selects haplotypes based on their (absolute) frequencies and/or proportions of missing nucleotides.

Usage

# S3 method for haplotype
subset(x, minfreq = 1, maxfreq = Inf, maxna = Inf, na = c("N", "?"), ...)

Value

an object of class c("haplotype", "DNAbin").

Arguments

x: an object of class c("haplotype", "DNAbin").
minfreq, maxfreq: the lower and upper limits of (absolute) haplotype frequencies. By default, all haplotypes are selected whatever their frequency.
maxna: the maximum frequency (absolute or relative; see details) of missing nucleotides within a given haplotype.
na: a vector of mode character specifying which nucleotide symbols should be treated as missing data; by default, unknown nucleotide (N) and completely unknown site (?) (can be lower- or uppercase). There are two shortcuts: see details.
...: unused.

Author

Emmanuel Paradis

Details

The value of maxna can be either less than one, or greater or equal to one. In the former case, it is taken as specifying the maximum proportion (relative frequency) of missing data within a given haplotype. In the latter case, it is taken as the maximum number (absolute frequency).

na = "all" is a shortcut for all ambiguous nucleotides (including N) plus alignment gaps and completely unknown site (?).

na = "ambiguous" is a shortcut for only ambiguous nucleotides (including N).

Examples

Run this code

data(woodmouse)
h <- haplotype(woodmouse)
subset(h, maxna = 20)
subset(h, maxna = 20/ncol(h)) # same thing than above

Run the code above in your browser using DataLab