getFastaFromBed: Get fasta information based on locations in bed-format

Description

For a given fasta and a bed file this function can extract the nucleotide sequences and stores them as fasta file.

Usage

getFastaFromBed(bed, species=NULL, assembly = NULL, fastaFolder=NULL,
                  verbose=TRUE, export=NULL, fileName=NULL)

Value

An fa object containing the nucleotide sequences in fasta format.

Arguments

bed: The location in bed format, see details.
species: Define the species.
assembly: Assembly identifier.
fastaFolder: Location of the fasta files.
verbose: Logical, should informative status updates be given.
export: Foldername.
fileName: Filename to store the FA object.

Author

Daniel Fischer

Details

Function expects as an input a data.frame in bed format. This means, the first column should contain the chromosome, the second the start-coordinates, the third the end-coordinates. The forth column contains the ID of the loci.

If a standard species is used (as defined in the species data frame), the function automatically downloads the required files from NCBI, takes the loci and extracts then the nucleotide sequences from it. If the corresponding assemly is not available from NCBI an own fasta file can be provided. For that the fa-file needs to be in the fastaFolder and follow the same naming system as the NCBI files are labelled. In that case, the function suggests the correct filename for an unknown assembly.

The export function, specifies then a folder to where the fasta file should be stored. If no filename is provided, the filename is then the object name passed to the bed function.

Examples

Run this code

if (FALSE) {

myBed <- data.frame(chr=c(1,2),
                    start=c(235265,12356742),
                    end=c(435265,12386742),
                    gene=c("LOC1", "LOC2"))

myFA <- getFastaFromBed(myBed, species="Homo sapiens", fastaFolder="/home/user/fasta/", export=TRUE)
}

Run the code above in your browser using DataLab