readCGATS: read tables from files in ANSI/CGATS.17 format

Description

The CGATS text format supports a preamble followed by N tables, where N \(\ge\) 1. Each table can have a separate header. A table may or may not contain spectral data, see Note. The function converts each table to a data.frame with attributes; see Details.

Usage

readCGATS( path, collapsesingle=FALSE )

Value

readCGATS() returns a list of data.frames - one data.frame for each table found in path. The list and each individual data.frame have attributes, see Details.

If path has only a single table (the majority of files have only 1) and collapsesingle is TRUE, then the attributes of the list are copied to those of the data.frame, and the data.frame is then returned. The name of the table is lost.

If there is an error in any table, then the function returns NULL.

Arguments

path: the path name of a single file, in CGATS format
collapsesingle: If path has only one table (N=1) and collapsesingle is TRUE, then return the single data.frame (instead of a list with 1 data.frame). If path has multiple tables (N \(\ge\) 2), then collapsesingle is ignored.

Details

The returned list is given attributes: "path", "preamble", and (if present) "date", "created", "originator", and "file_descriptor". The attribute values are all character vectors. The value of attribute "path" is the argument path, and the other values are extracted from "preamble". The length of "preamble" is (typically) greater than 1, and the others have length 1. Each line of the preamble is a keyword-value pair. The keyword ORIGINATOR is converted to attribute "originator". The keyword FILE_DESCRIPTOR is converted to attribute "file_descriptor". The keyword CREATED is converted to attributes "created" and "date". The list is also given names. If the keyword TABLE_NAME is present in the table header, then its value is used. Otherwise the names are "TABLE_1", "TABLE_2", ...

Each data.frame in the list is assigned attributes: "header", and (if present) "descriptor". The length of "header" is (typically) greater than 1, and "descriptor" has length 1. Each line of the table header is a keyword-value pair. The keywords DESCRIPTOR and TABLE_DESCRIPTOR are converted to attribute "descriptor".

For the lines between BEGIN_DATA and END_DATA, two conventions for separating the values are supported:

In the standard convention, fields are separated by contiguous spaces or tabs, and character strings (which may have embedded spaces or even tabs) are enclosed by double-quotes. This is is the convention in the CGATS standard. The function scan() is used here.
In the non-standard convention, fields are separated by a single tab, and character strings (which may have embedded spaces but not tabs) are not enclosed by double-quotes. This convention is often easier to work with in spreadsheet software. The function strsplit() is used here.

The function readCGATS() selects the separation convention by examining the line after BEGIN_DATA_FORMAT. If this line is split by a single tab and the number of fields matches that given on the NUMBER_OF_FIELDS line, then the non-standard convention is selected; otherwise, the standard convention is selected.

References

ANSI/CGATS.17. Graphic technology - Exchange format for colour and process control data using XML or ASCII text. https://webstore.ansi.org/ 2009.

ISO/28178. Graphic technology - Exchange format for colour and process control data using XML or ASCII text. https://www.iso.org/standard/44527.html. 2009.

CGATS.17 Text File Format. http://www.colorwiki.com/wiki/CGATS.17_Text_File_Format.

Examples

Run this code

#   read file with 2 tables of non-spectral data
A70 = readSpectra( system.file( "tests/A70.ti3", package='colorSpec' ) )
length(A70)         # [1] 2   # the file has 2 tables
ncol( A70[[1]] )    # [1] 7   # the 1st table has 7 columns
ncol( A70[[2]] )    # [1] 4   # the 2nd table has 4 columns

Run the code above in your browser using DataLab