Learn R Programming

refGenome (version 1.7.7)

read.gtf: Reading and parsing GTF files into refGenome objects.

Description

Reads and parses content of GTF files. The file content is written into the provided object into the environment located in 'ev' slot (i.e. per reference). The function writes two tables: 'gtf' containing the main file content and 'genes' containing data from 'gene' typed features.

Usage

read.gtf(object, filename="transcripts.gtf", sep = "\t",
            useBasedir=TRUE, comment.char = "#", progress=100000L, ...)

Arguments

object

refGenome object. Will contain the extracted data.

filename

(Base-)Name of GTF file.

sep

Character: Column separator in GTF file. Standard value is '\t'.

useBasedir

Logical: Shall basedir (from refGenome object) be appended to filename?

comment.char

Character: Lines beginning with this character will be skipped.

progress

Integer: The parsing routine prints a progress Information after reading the given number of lines.

...

Currently unused.

Value

None. The provided object is filled with the parsed data. Two tables are generated: 'gtf' and 'genes'. The first eight columns of the gtf table are fixed. The content is described in the following table.

id Numeric index for unique site. Integer.
seqid Chromosome identifier. Character.
source Program which generated data.
feature Feature type (e.g. 'exon', 'CDS'). Character.
start Start position of feature (1-based). Integer.
end End position of feature (inclusive). Integer.
score Value between 0 and 1000 ("." for no score). Character.
strand '+', '-' or '.'. Character.
frame 0-2 for coding exons. '.' otherwise. Character.

Details

GTF is an extension of the GFF file format. GTF contains tabled data: Nine columns separated by a tab delimiter. The last column expands into a list of attributes, separated by a semicolon an exactly one space. Each attribute consists of a type - value pair which are separated by one empty space. Enclosing quotation marks (") around attribute values are marks are skipped during import.

References

UCSC Genome Bioinformatics: Data File Formats. http://genome.ucsc.edu/FAQ/FAQformat.html#format3

Examples

Run this code
# NOT RUN {
##-------------------------------------##
## Ensembl
##-------------------------------------##
ef <- system.file("extdata", package="refGenome")
en <- ensemblGenome(ef)
read.gtf(en, "hs.ensembl.76.small.gtf")
# }

Run the code above in your browser using DataLab