Learn R Programming

microseq (version 2.1.6)

readGFF: Reading and writing GFF-tables

Description

Reading or writing a GFF-table from/to file.

Usage

readGFF(in.file)
writeGFF(gff.table, out.file)

Value

readGFF returns a gff.table with the columns described above.

writeGFF writes the supplied gff.table to a text-file.

Arguments

in.file

Name of file with a GFF-table.

gff.table

A table (tibble) with genomic features information.

out.file

Name of file.

Author

Lars Snipen and Kristian Hovde Liland.

Details

A GFF-table is simply a tibble with columns adhering to the format specified by the GFF3 format, see https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md for details. There is one row for each feature.

The following columns should always be in a full gff.table of the GFF3 format:

  • Seqid. A unique identifier of the genomic sequence on which the feature resides.

  • Source. A description of the procedure that generated the feature, e.g. "R-package micropan::findOrfs".

  • Type The type of feature, e.g. "ORF", "16S" etc.

  • Start. The leftmost coordinate. This is the start if the feature is on the Sense strand, but the end if it is on the Antisense strand.

  • End. The rightmost coordinate. This is the end if the feature is on the Sense strand, but the start if it is on the Antisense strand.

  • Score. A numeric score (E-value, P-value) from the Source.

  • Strand. A "+" indicates Sense strand, a "-" Antisense.

  • Phase. Only relevant for coding genes. the values 0, 1 or 2 indicates the reading frame, i.e. the number of bases to offset the Start in order to be in the reading frame.

  • Attributes. A single string with semicolon-separated tokens prociding additional information.

Missing values are described by "." in the GFF3 format. This is also done here, except for the numerical columns Start, End, Score and Phase. Here NA is used, but this is replaced by "." when writing to file.

The readGFF function will also read files where sequences in FASTA format are added after the GFF-table. This file section must always start with the line ##FASTA. This fasta object is added to the GFF-table as an attribute (use attr(gff.tbl, "FASTA") to retrieve it).

See Also

findOrfs, lorfs.

Examples

Run this code
# Using a GFF file in this package
gff.file <- file.path(path.package("microseq"),"extdata","small.gff")

# Reading gff-file
gff.tbl <- readGFF(gff.file)

Run the code above in your browser using DataLab