Learn R Programming

bio3d (version 2.1-2)

ide.filter: Percent Identity Filter

Description

Identify and filter subsets of sequences at a given sequence identity cutoff.

Usage

ide.filter(aln = NULL, ide = NULL, cutoff = 0.6, verbose = TRUE, ncore=1, nseg.scale=1)

Arguments

aln
sequence alignment list, obtained from seqaln or read.fasta, or an alignment character matrix. Not used if ide is given.
ide
an optional identity matrix obtained from seqidentity.
cutoff
a numeric identity cutoff value ranging between 0 and 1.
verbose
logical, if TRUE print details of the clustering process.
ncore
number of CPU cores used to do the calculation. ncore>1 requires package parallel installed.
nseg.scale
split input data into specified number of segments prior to running multiple core calculation. See fit.xyz.

Value

  • Returns a list object with components:
  • indindices of the sequences below the cutoff value.
  • treean object of class "hclust", which describes the tree produced by the clustering process.
  • idea numeric matrix with all pairwise identity values.

Details

This function performs hierarchical cluster analysis of a given sequence identity matrix ide, or the identity matrix calculated from a given alignment aln, to identify sequences that fall below a given identity cutoff value cutoff.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

See Also

read.fasta, seqaln, seqidentity, entropy, consensus

Examples

Run this code
data(kinesin)
attach(kinesin, warn.conflicts=FALSE)

ide.mat <- seqidentity(pdbs)

# Histogram of pairwise identity values
op <- par(no.readonly=TRUE)
par(mfrow=c(2,1))
hist(ide.mat[upper.tri(ide.mat)], breaks=30,xlim=c(0,1),
     main="Sequence Identity", xlab="Identity")

k <- ide.filter(ide=ide.mat, cutoff=0.6)
ide.cut <- seqidentity(pdbs$ali[k$ind,])
hist(ide.cut[upper.tri(ide.cut)], breaks=10, xlim=c(0,1),
     main="Sequence Identity", xlab="Identity")

#plot(k$tree, axes = FALSE, ylab="Sequence Identity")
#print(k$ind) # selected
par(op)
detach(kinesin)

Run the code above in your browser using DataLab