Learn R Programming

BSgenome (version 1.40.1)

available.genomes: Find available/installed genomes

Description

available.genomes gets the list of BSgenome data packages that are available in the Bioconductor repositories for your version of R/Bioconductor.

installed.genomes gets the list of BSgenome data packages that are currently installed on your system.

getBSgenome searchs the installed BSgenome data packages for the specified genome and returns it as a BSgenome object.

Usage

available.genomes(splitNameParts=FALSE, type=getOption("pkgType"))
installed.genomes(splitNameParts=FALSE)
getBSgenome(genome, masked=FALSE)

Arguments

splitNameParts
Whether to split or not the package names in parts. In that case the result is returned in a data frame with 5 columns.
type
Character string indicating the type of package ("source", "mac.binary" or "win.binary") to look for.
genome
A BSgenome object, or the full name of an installed BSgenome data package, or a short string specifying a genome assembly (a.k.a. provider version) that refers unambiguously to an installed BSgenome data package.
masked
TRUE or FALSE. Whether to search for the masked BSgenome object (i.e. the object that contains the masked sequences) or not (the default).

Value

For available.genomes and installed.genomes: by default (i.e. if splitNameParts=FALSE), a character vector containing the names of the BSgenome data packages that are available (for available.genomes) or currently installed (for installed.genomes). If splitNameParts=TRUE, the list of packages is returned in a data frame with one row per package and the following columns: pkgname (character), organism (factor), provider (factor), provider_version (character), and masked (logical).For getBSgenome: the BSgenome object containing the sequences for the specified genome. Or an error if the object cannot be found in the BSgenome data packages currently installed.

Details

A BSgenome data package contains the full genome sequences for a given organism.

Its name typically has 4 parts (5 parts if it's a masked BSgenome data package i.e. if it contains masked sequences) separated by a dot e.g. BSgenome.Mmusculus.UCSC.mm10 or BSgenome.Mmusculus.UCSC.mm10.masked:

  1. The 1st part is always BSgenome.

  • The 2nd part is the name of the organism in abbreviated form e.g. Mmusculus, Hsapiens, Celegans, Scerevisiae, Ecoli, etc...
  • The 3rd part is the name of the organisation who provided the genome sequences. We formally refer to it as the provider of the genome. E.g. UCSC, NCBI, TAIR, etc...
  • The 4th part is the release string or number used by this organisation for this particular genome assembly. We formally refer to it as the provider version of the genome. E.g. hg38, GRCh38, hg19, mm10, susScr3, etc...
  • If the package contains masked sequences, its name has the .masked suffix added to it, which is typically the 5th part.
  • A BSgenome data package contains a single top-level object (a BSgenome object) named like the package itself that can be used to access the genome sequences.

    See Also

    Examples

    Run this code
    ## ---------------------------------------------------------------------
    ## available.genomes() and installed.genomes()
    ## ---------------------------------------------------------------------
    
    # What genomes are currently installed:
    installed.genomes()
    
    # What genomes are available:
    available.genomes()
    
    # Split the package names in parts:
    av_gen <- available.genomes(splitNameParts=TRUE)
    table(av_gen$organism)
    table(av_gen$provider)
    
    # Make your choice and install with:
    library(BiocInstaller)
    biocLite("BSgenome.Scerevisiae.UCSC.sacCer1")
    
    # Have a coffee 8-)
    
    # Load the package and display the index of sequences for this genome:
    library(BSgenome.Scerevisiae.UCSC.sacCer1)
    Scerevisiae  # same as BSgenome.Scerevisiae.UCSC.sacCer1
    
    ## ---------------------------------------------------------------------
    ## getBSgenome()
    ## ---------------------------------------------------------------------
    
    ## Specify the full name of an installed BSgenome data package:
    genome <- getBSgenome("BSgenome.Celegans.UCSC.ce2")
    genome
    
    ## Specify a genome assembly (a.k.a. provider version):
    genome <- getBSgenome("hg38")
    class(genome)  # BSgenome object
    providerVersion(genome)
    genome$chrM
    
    genome <- getBSgenome("hg38", masked=TRUE)
    class(genome)  # MaskedBSgenome object
    providerVersion(genome)
    genome$chr22
    

    Run the code above in your browser using DataLab