Learn R Programming

GeneRegionScan (version 1.28.0)

findSequenceInGenome: Find a sequence in genome

Description

Wrapper around matchPDict that will accept a list of sequences and check if they are present in a given genome. Takes a long time to run.

Usage

findSequenceInGenome(sequences, genome="BSgenome.Hsapiens.UCSC.hg19", verbose=TRUE, directions=c("matchForwardSense", "matchForwardAntisense", "matchReverseSense", "matchReverseAntisense"))

Arguments

sequences
vector of character strings to scan. Should only contain A, C, G and T. Will be converted to DNAString.
genome
character string with the name of the BSGenome in which sequences should be found. Defaults to the human genome.
verbose
TRUE or FALSE.
directions
character string with elements from c("matchForwardSense", "matchForwardAntisense", "matchReverseSense", "matchReverseAntisense"). Defines which directions (complementary and reverse mirrorings) that should be scanned. Defaults to all directions.

Value

"entrynumber","hitposInChr","chr", and "sequence" describing, respectively: the index of the sequence match, the position in the chromosome at which it was found, which chromosome it was found on, the sequence itself

Details

This function will take quite a while to run, so if you have a many sequences, overnight runs are recommended. BSgenome contains some alternative versions of chromosomes. They are marked with an underscore. This function automatically disregards chromosome names with an underscore, and this is known to work for the human genome. Nevertheless, check the output printed to terminal if all chromosomes are included.

See Also

BSgenome, matchPDict, excludeDoubleMatchingProbes

Examples

Run this code



	## Not run: 
# 	#you can run this, but it takes quite a lot of time
# 	example<-findSequenceInGenome("CTGGCGAGCAGCGAATAATGGTTT")
# 	## End(Not run)

Run the code above in your browser using DataLab