BSgenome-utils: BSgenome utilities

Description

Utilities for BSgenome objects.

Usage

"matchPWM"(pwm, subject, min.score = "80%", exclude = "", maskList = logical(0))
"countPWM"(pwm, subject, min.score = "80%", exclude = "",  maskList = logical(0))
"vmatchPattern"(pattern, subject, max.mismatch = 0, min.mismatch = 0, with.indels = FALSE, fixed = TRUE, algorithm = "auto", exclude = "", maskList = logical(0),  userMask = RangesList(), invertUserMask = FALSE)
"vcountPattern"(pattern, subject, max.mismatch = 0, min.mismatch = 0, with.indels = FALSE, fixed = TRUE, algorithm = "auto", exclude = "", maskList = logical(0),  userMask = RangesList(), invertUserMask = FALSE)
"vmatchPDict"(pdict, subject, max.mismatch = 0, min.mismatch = 0, fixed = TRUE, algorithm = "auto", verbose = FALSE, exclude = "", maskList = logical(0))
"vcountPDict"(pdict, subject, max.mismatch = 0, min.mismatch = 0, fixed = TRUE, algorithm = "auto", collapse = FALSE, weight = 1L, verbose = FALSE, exclude = "", maskList = logical(0))

Arguments

pwm

A numeric matrix with row names A, C, G and T representing a Position Weight Matrix.

pattern

A DNAString object containing the pattern sequence.

pdict

A DNAStringSet object containing the pattern sequences.

subject

A BSgenome object containing the subject sequences.

min.score

The minimum score for counting a match. Can be given as a character string containing a percentage (e.g. "85%") of the highest possible score or as a single number.

max.mismatch, min.mismatch

The maximum and minimum number of mismatching letters allowed (see ?`lowlevel-matching` for the details). If non-zero, an inexact matching algorithm is used.

with.indels

If TRUE then indels are allowed. In that case, min.mismatch must be 0 and max.mismatch is interpreted as the maximum "edit distance" allowed between any pattern and any of its matches (see ?`matchPattern` for the details).

fixed

If FALSE then IUPAC extended letters are interpreted as ambiguities (see ?`lowlevel-matching` for the details).

algorithm

For vmatchPattern and vcountPattern one of the following: "auto", "naive-exact", "naive-inexact", "boyer-moore", "shift-or", or "indels".

For vmatchPDict and vcountPDict one of the following: "auto", "naive-exact", "naive-inexact", "boyer-moore", or "shift-or".

collapse, weight

ignored arguments.

verbose

TRUE or FALSE.

exclude

A character vector with strings that will be used to filter out chromosomes whose names match these strings.

maskList

A named logical vector of maskStates preferred when used with a BSGenome object. When using the bsapply function, the masks will be set to the states in this vector.

userMask

A RangesList, containing a mask to be applied to each chromosome. See bsapply.

invertUserMask

Whether the userMask should be inverted.

Value

A GRanges object for matchPWM with two elementMetadata columns: "score" (numeric), and "string" (DNAStringSet).A GRanges object for vmatchPattern.A GRanges object for vmatchPDict with one elementMetadata column: "index", which represents a mapping to a position in the original pattern dictionary.A data.frame object for countPWM and vcountPattern with three columns: "seqname" (factor), "strand" (factor), and "count" (integer).A DataFrame object for vcountPDict with four columns: "seqname" ('factor' Rle), "strand" ('factor' Rle), "index" (integer) and "count" ('integer' Rle). As with vmatchPDict the index column represents a mapping to a position in the original pattern dictionary.

Examples

Run this code

  library(BSgenome.Celegans.UCSC.ce2)
  data(HNF4alpha)

  pwm <- PWM(HNF4alpha)
  matchPWM(pwm, Celegans)
  countPWM(pwm, Celegans)

  pattern <- consensusString(HNF4alpha)
  vmatchPattern(pattern, Celegans, fixed = "subject")
  vcountPattern(pattern, Celegans, fixed = "subject")

  vmatchPDict(HNF4alpha[1:10], Celegans)
  vcountPDict(HNF4alpha[1:10], Celegans)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples