Learn R Programming

seqinr (version 3.1-2)

words.pos: Positions of possibly degenerated motifs within sequences

Description

word.pos searches all the occurences of the motif pattern within the sequence text and returns their positions. This function is based on regexp allowing thus for complex motif searches. The main difference with gregexpr is that non disjoint matches are reported here.

Usage

words.pos(pattern, text, ignore.case = FALSE,
                      perl = TRUE, fixed = FALSE, useBytes = TRUE, ...)

Arguments

pattern
character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector.
text
a character vector where matches are sought.
ignore.case
if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
perl
logical. Should perl-compatible regexps be used if available? Has priority over extended.
fixed
logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments.
useBytes
logical. If TRUE the matching is done byte-by-byte rather than character-by-character.
...
arguments passed to regexpr.

Value

  • a vector of positions for which the motif pattern was found in the sequence text.

Details

Default parameter values have been tuned for speed when working biological sequences.

References

citation("seqinr")

See Also

regexpr

Examples

Run this code
myseq <- "tatagaga"
words.pos("t", myseq)   # Should be 1 3
words.pos("tag", myseq) # Should be 3
words.pos("ga", myseq)  # Should be 5 7
# How to specify ambiguous base ? Look for YpR motifs by
words.pos("[ct][ag]", myseq) # Should be 1 3
#
# Show the difference with gregexpr:
#
words.pos("toto", "totototo")           # 1 3 5 (three overlapping matches)
unlist(gregexpr("toto",  "totototo")) # 1 5    (two disjoint matches)

Run the code above in your browser using DataLab