Learn R Programming

intervals (version 0.15.5)

sgd: Yeast gene model sample data

Description

This data set contains a data frame describing a subset of the chromosome feature data represented in Fall 2007 version of saccharomyces_cerevisiae.gff, available for download from the Saccharomyces Genome Database (https://www.yeastgenome.org:443/).

Usage

data(sgd)

Arguments

Format

A data frame with 14080 observations on the following 8 variables.

SGDID

SGD feature ID.

type

Only four feature types have been retatined: "CDS", "five_prime_UTR_intron", "intron", and "ORF". Note that "ORF" correspond to a whole gene while "CDS", to an exon. S. cerevisae does not, however, have many multi-exonic genes.

feature_name

A character vector

parent_feature_name

The feature_name of the a larger element to which the current feature belongs. All retained "CDS" entries, for example, belong to an "ORF" entry.

chr

The chromosome on which the feature occurs.

start

Feature start base.

stop

Feature stop base.

strand

Is the feature on the Watson or Crick strand?

Examples

Run this code

# An example to compute "promoters", defined to be the 500 bases
# upstream from an ORF annotation, provided these bases don't intersect
# another orf. See documentation for the sgd data set for more details
# on the annotation set.

use_chr <- "chr01"

data( sgd )
sgd <- subset( sgd, chr == use_chr )

orf <- Intervals(
                 subset( sgd, type == "ORF", c( "start", "stop" ) ),
                 type = "Z"
                 )
rownames( orf ) <- subset( sgd, type == "ORF" )$feature_name

W <- subset( sgd, type == "ORF", "strand" ) == "W"

promoters_W <- Intervals(
                         cbind( orf[W,1] - 500, orf[W,1] - 1 ),
                         type = "Z"
                         )

promoters_W <- interval_intersection(
                                     promoters_W,
                                     interval_complement( orf )
                                     )

# Many Watson-strand genes have another ORF upstream at a distance of
# less than 500 bp

hist( size( promoters_W ) )

# All CDS entries are completely within their corresponding ORF entry.

cds_W <- Intervals(
                 subset( sgd, type == "CDS" & strand == "W", c( "start", "stop" ) ),
                 type = "Z"
                 )
rownames( cds_W ) <- NULL

interval_intersection( cds_W, interval_complement( orf[W,] ) )

Run the code above in your browser using DataLab