Learn R Programming

seqTools (version 1.6.0)

countDnaKmers: countDnaKmers: Counting k-mers in DNA sequence.

Description

Counts occurrence of DNA k-mers in given DNA sequence. The k-mers are searched in a set of search windows, which are defined by start and width parameter. From each position of the search window, a DNA k-mer is identified on the right hand side on the given DNA sequence. Each value in the start vector defines the left border of a search window. The size of the search window is given by the appropriate value in the width vector. The function is intended to count DNA k-mers in selected regions (e.g. exons) on DNA sequence.

Usage

countDnaKmers(dna,k,start,width)

Arguments

dna
character. Single DNA sequence (vector of length 1). dna must not contain other characters than "ATCGN". Capitalization does not matter. When a 'N' character is found, the current DNA k-mer is skipped.
k
numeric. Number of nucleotides in tabled DNA motifs.
start
numeric. Vector of (1-based) start positions for reading frames. Reading frame is counted to the right side of the DNA string.
width
numeric. Defines size of search window for each start position. Must have the same length as start or length 1 (in which case the values of width are recycled.

Value

. Each colum contains the motif-count values for one frame. The column names are the values in the start vector. Each row represents one DNA motif. The DNA sequence of the DNA motif is given as row.name.

Details

The start positions for counting of DNA k-mers are all positions in {start,...,start+width-1}. As the identification of a DNA k-mer scans a sequence window of size k, the last allowed start position counting a k-mer is nchar(dna)-k+1. The function throws the error 'Search region exceeds string end' when a value start + width + k > nchar(dna) + 2 occurs.

See Also

countGenomeKmers

Examples

Run this code
seq <- "ATAAATA"
countDnaKmers(seq, 2, 1:3, 3)

Run the code above in your browser using DataLab