genomeIntervals (version 1.28.0)

readGff3,character-method: readGff3

Description

Read (write) a Genome_intervals_stranded object from (to) a GFF3 file

Usage

readGff3(file, isRightOpen=FALSE, quiet=FALSE) readBasePairFeaturesGff3(file, quiet=FALSE) readZeroLengthFeaturesGff3(file, quiet=FALSE) writeGff3(object, file)

Arguments

file
The name of the gff file to read/write.
isRightOpen
Although it is arguable that a GFF3 file might have a right-open intervals convention - the format description being at best imprecise - most GFF3 file follow a right-closed convention. Hence, as of version 1.25.1, the default has been changed to isRightOpen = FALSE. See the details section on how to restore the older behaviour.
quiet
a boolean to turn verbosity off when reading a Gff3 file
object

Value

  • readGff3 and friendsA Genome_intervals_stranded object image of the gff file. The GFF3 fields seqid, source, type, score, strand, phase and attributes are stored in the annotation slot and renamed as seq_name, source, type, score, strand, phase and gffAttributes respectively.
  • writeGff3It dispatches to write.table and hence returns similar values.

Details

  • readGff3 Make a Genome_intervals_stranded object from a gff file in gff3 format.
  • readBasePairFeaturesGff3 Same as readGff3 assuming isRightOpen='FALSE', i.e. no zero length intervals are created. This is the default behaviour since v1.25.1.
  • readZeroLengthFeaturesGff3 Same as readGff3 assuming isRightOpen='TRUE', i.e. zero length intervals are created when a feature's start is the same as its end. This was the default prior to version 1.25.1.
  • writeGff3 Write a Genome_intervals object to a gff file in gff3 format.

The file must follow gff3 format specifications as in http://www.sequenceontology.org/gff3.shtml. Due to the imprecise definition and to allow for zero-length features, the default for reading a Gff3 file has been to assume right open intervals (until v1.25.1). As by then, the community consensus has been to use closed intervals, the default behaviour of readGff3 has been changed accordingly. The readGff3 file is now a wrapper that dispatches to two sub functions - which may be used directly - readBasePairFeaturesGff3 and readZeroLengthFeaturesGff3. The former assumes closed intervals and hence does not create zero-length intervals. The latter does the opposite and uses right-open intervals!

Some more noteworthy details:

The file is read as a table and meta-information (lines starting with ###) are not parsed.

A “.” in, for example, the gff file's score or frame field will be converted to NA.

When the GFF file follows the right-open interval convention (isRightOpen is TRUE), then GFF entries for which end base equals first base are recognized as zero-length features and loaded as inter_base intervals.

Strand entries in the file are expected to be '.', '?', '+' or '-'. The two first are mapped to NA.

It can be that readGff3 is able to construct a Genome_intervals_stranded object from the input file, although not valid. A warning message is then generated and the constructed object is returned to allow inspection of it.

Potential FASTA entries at the end of the file are ignored.

See Also

The functions getGffAttribute and parseGffAttributes for parsing GFF attributes.

Examples

Run this code
# Get file path
libPath <- installed.packages()["genomeIntervals", "LibPath"]
filePath <- file.path(
 libPath,
 "genomeIntervals",
 "example_files"
)

# Load SGD gff
# SGD does not comply to the GFF3 right-open interval convention
gff <- readGff3( file.path( filePath, "sgd_simple.gff"), isRightOpen = FALSE)

head(gff,10)

head(annotation(gff),10)

## Not run: 
# ## write the gff3 file
# writeGff3(gff,file="sgd_simple.gff")
# ## End(Not run)

Run the code above in your browser using DataLab