"width"(x)
"conservedPositions"(x)
"wmName"(x)
"wmFilename"(x)
"wmLocations"(x)
"wmStrictLocations"(x)
"show"(object)
"wmScore"(object, dnaseqs)
"wmScore"(object, dnaseqs)
"wmScore"(object, dnaseqs, locations, strictLocations)
"wmScore"(object, dnaseqs, locations, strictLocations)
WeightMatrix
object.WeightMatrix
object or the file name of a weight matrix.DNAStringSet
object, both of which store
nucleotide sequences to be scored using the input WeightMatrix
object.readWm()
function call. Please consult the manual page
of readWm()
.readWm()
function call. Please consult the manual page
of readWm()
.WeightMatrix
class and associated methods serve the purpose of enabling the VariantFiltering
package
to score synonymous and intronic genetic variants for potential cryptic splice sites. The class and the methods,
however, are exposed to the end user since they could be useful for other analysis purposes.The VariantFiltering
package contains two weight matrices, one for 5'ss and another for 3'ss, which have been built
by a statistical method that accounts for dependencies between the splice site positions, minimizing the rate of
false positive predictions. The method concretely builds these models by inclusion-driven learning of Bayesian
networks and further details can be found in the paper of Castelo and Guigo (2004).
The function readWm()
reads a weight matrix stored in a text file in a particular format and returns
a WeightMatrix
object. See the .ibn
files located in the extdata
folder of the VariantFiltering
package, as an example of this format that is specifically designed to enable the storage of weights that may
depend on the occurrence of nucleotides at other positions on the matrix.
Next to this specific weight matrix format, the function readWm()
can also read the MEME motif format specified at
http://meme-suite.org/doc/meme-format.html. Under this format, this function reads only currently one matrix per file
and when values correspond to probabilities (specified under the motif letter-probability matrix
line) they are
automatically converted to weights by either using the background frequencies specified in the background letter frequencies
line or using an uniform distribution otherwise.
The method wmScore()
scores one or more sequences of nucleotides using the input WeightMatrix
object.
When the input object is the file name of a weight matrix, the function readWm()
is called to read first that
weight matrix and internally create a WeightMatrix
object. This way to call wmScore()
is required when
using it in parallel since currently WeightMatrix
objects are not serializable.
If the sequences are longer than the width of the weight matrix, this function will score every possible site
within those sequences. It returns a list where each element is a vector with the calculated scores of the corresponding
DNA sequence. When the scores cannot be calculated
because of a conserved position that does not occur in the sequence (i.e., absence of a GT dinucleotide with the
5'ss weight matrix), it returns NA
as corresponding score value.
The method width()
takes a WeightMatrix
object as input and returns the number of positions of the
weight matrix.
The method conservedPositions()
takes a WeightMatrix
object as input and returns the number of
fully conserved positions in the weight matrix.
wm <- readWm(file.path(system.file("extdata", package="VariantFiltering"), "hsap.donors.hcmc10_15_1.ibn"),
locations="fiveSpliceSite", strictLocations=TRUE)
wm
wmFilename(wm)
width(wm)
wmName(wm)
wmLocations(wm)
wmStrictLocations(wm)
conservedPositions(wm)
wmScore(wm, "CAGGTAGGA")
wmScore(wm, "CAGGAAGGA")
wmScore(wm, "CAGGTCCTG")
wmScore(wm, "CAGGTCGTGGAG")
Run the code above in your browser using DataLab