readTSVmod: Read a Metabolic Network in a TSV (CSV) Format

Description

The function readTSVmod reads metabolic networks in text files, following a character-separated value format. Each line should contain one entry; the default value separator is a tab. Output files from the BiGG database are compatible.

Usage

readTSVmod(prefix, suffix,
             reactList, metList = NA, modDesc = NA,
             fielddelim = "\t", entrydelim = ", ", extMetFlag = "b",
             excludeComments = TRUE,
             oneSubSystem = TRUE,
             mergeMet = TRUE,
             balanceReact = TRUE,
             remUnusedMetReact = TRUE,
             singletonMet = FALSE,
             deadEndMet = FALSE,
             remMet = FALSE,
             constrMet = FALSE,
             tol = SYBIL_SETTINGS("TOLERANCE"),
             fpath = SYBIL_SETTINGS("PATH_TO_MODEL"),
             def_bnd = SYBIL_SETTINGS("MAXIMUM"),
             arrowlength = NULL,
             quoteChar = "",
             commentChar, ...)

Arguments

prefix

A single character string giving the prefix for three possible input files (see Details below).

suffix

A single character string giving the file name extension. If missing, the value of suffix depends on the argument fielddelim, see Details below. Default: "tsv".

reactList

A single character vector giving a file name containing a reaction list. Only necessary, if argument suffix is empty.

metList

A single character vector giving a file name containing a metabolite list. Default: NA.

modDesc

A single character vector giving a file name containing a model description. Default: NA.

fielddelim

A single character string giving the value separator. Default: "\t".

entrydelim

A single character string giving the a separator for values containing more than one entry. Default: ", ".

extMetFlag

A single character string giving the identificator for metabolites which are outside the system boundary. Only necessary, if the model is a closed one. Default: "b".

excludeComments

A Boolean value. Sometimes, the reaction abbreviations and/or the metabolite abbreviations contain comments in square brackets. If set to TRUE, these comments will be removed. If set to FALSE, whitespaces included in comments in metabolite abbreviations will be removed. Comments in reaction abbreviations stay unchanged. A reaction id with comment is, for example, the string: pfk [comment], with [comment] being the comment. There must be at least one whitespace between id and comment, otherwise it will be considered as compartment flag. Default: TRUE.

oneSubSystem

A Boolean value. Ignore parameter entrydelim for the field ‘subsystem’, if every reaction belongs to exactly one sub system. Default: TRUE.

mergeMet

Boolean: if set to TRUE, metabolites used more than once as reactand or product in a particular reaction are added up, see details below. If set to FALSE, the last value is used without warning. Default: TRUE.

balanceReact

Boolean: if set to TRUE, metabolites used as reactand and product in a particular reaction at the same time are balanced, see details below. If set to FALSE the last value is used without warning (reactands before products). Default: TRUE.

remUnusedMetReact

Boolean: if set to TRUE, metabolites and reactions which are not used in the stoichiometric matrix will be removed. A metabolite or a reaction is considered as unused, if the corresponding element of rowSums (metabolites) or colSums (reactions) of the binary version of the stoichiometric matrix is zero, see details below. If set to FALSE, only a warning is given. Default: FALSE.

singletonMet

Boolean: if set to TRUE, metabolites appearing only once in the stoichiometric matrix are identified. Metabolites appear only once, if rowSums of the binary stoichiometric matrix is one in the corresponding row, see details below. Default: FALSE.

deadEndMet

Boolean: if set to TRUE, metabolites which are produced but not consumed, or vice versa are identified, see details below. If both arguments singletonMet and deadEndMet are set to TRUE, the function will first look for singleton metabolites, and exclude them (and the corresponding reactions) from the search list. Afterwards, dead end metabolites are searched only in the smaller model. Default: FALSE.

remMet

Boolean: if set to TRUE, metabolites identified as singleton or dead end metabolites will be removed from the model. Additionally, reactions containing such metabolites will be removed also. Default: FALSE.

constrMet

Boolean: if set to TRUE, reactions containing metabolites identified as singleton or dead end metabolites will be constrained to zero. Default: FALSE.

tol

A single numeric value, giving the smallest positive floating point number unequal to zero, see details below. Default: SYBIL_SETTINGS("TOLERANCE").

fpath

A single character string giving the path to a certain directory containing the model files. Default: SYBIL_SETTINGS("PATH_TO_MODEL").

def_bnd

A single numeric value. Absolute value for uppper and lower bounds for reaction bounds. Default: SYBIL_SETTINGS("MAXIMUM").

arrowlength

A single numeric or character value or NULL. This argument controls the number of "-" and "=" used in reaction arrows in the equation strings. If set to NULL, one or more symbols are used. The regular expression used is "<?[=-]+>". If numeric, all reaction arrows must consist of exactly arrowlength signs. The regular expression used is "<?[=-]{arrowlength}>". If character, arrowlength must be a regular expression and will be used as "<?[=-]arrowlength>". For example, if arrowlength is "{1,2}" the regular expression is "<?[=-]{1,2}>", meaning the reaction arrow can consist of one or two signs. In any case, the completed regular expression will always used with argument perl = TRUE. Default: NULL.

quoteChar

Set of quoting characters used for the argument quote in read.table, see there for details. Default: "" (disable quoting).

commentChar

A single character used for the argument comment.char in read.table, see there for details. If a comment char is needed, e.g. “@” (at) seems to be a good one. Default: "".

…

Further arguments passed to read.table, e.g. argument quote, comment.char or argument fill, if some lines do not have enough elements. If all fields are in double quotes, for example, set quote to "\"".

Value

An instance of class modelorg.

Details

A metabolic model consists of three input files:

<prefix>_react.<suffix> containing all reactions.
<prefix>_met.<suffix> containing all metabolites.
<prefix>_desc.<suffix> containing a model description.

All of these files must be character separated value files (for a detailed format description and examples, see package vignette). The argument prefix is the part of the filenames, all three have in common (e.g. if they where produced by modelorg2tsv). Alternatively, the arguments reactList, metList and modDesc can be used. A file containing all reactions must be there, everything else is optional.

If suffix is missing, it is set according to the value of fielddelim:

`"\t"`	`"tsv"`
`";"`	`"csv"`
`","`	`"csv"`
`"\|"`	`"dsv"`
anything else	`"dsv"`

The argument ... is passed to read.table.

In some cases, it could be necessary, to turn off quoting quoteChar = "" (default), if e.g. metabolite names contain quoting characters "'" like in 3',5'-bisphosphate nucleotidase. If all fields are in quotes (e.g. files generated by modelorg2tsv), use quoteChar = "\"" for example.

The input files are read using the function read.table. The argument header is set to TRUE and the argument sep is set to the value of fielddelim. Everything else can be passed via the ... argument.

The header for the reactions list may have the following columns:

`"abbreviation"`	a unique reaction id
`"name"`	a reaction name
`"equation"`	the reaction equation
`"reversible"`	TRUE, if the reaction is reversible
`"compartment"`	reaction compartment(s) (currently unused)
`"lowbnd"`	lower bound
`"uppbnd"`	upper bound
`"obj_coef"`	objective coefficient
`"rule"`	gene to reaction association
`"subsystem"`	subsystem of the reaction

Every entry except for "equation" is optional. If there are missing values in field "lowbnd", they will be set to -1 * def_bnd; if there are missing values in field "uppbnd", they will be set to def_bnd; if there are missing values in field "obj_coef", they will be set to 0.

The header for the metabolites list may have the following columns:

`"abbreviation"`	a unique metabolite id
`"name"`	a metabolite name
`"compartment"`	metabolite compartment (currently unused)

If a metabolite list is provided, it is supposed to contain at least the entries "abbreviation" and "name".

The header for the model description file may have the following columns:

`"name"`	a name for the model
`"id"`	a shorter model id
`"description"`	a model description
`"compartment"`	the compartments
`"abbreviation"`	unique compartment abbreviations
`"Nmetabolites"`	number of metabolites
`"Nreactions"`	number of reactions
`"Ngenes"`	number of independend genes
`"Nnnz"`	number of non-zero elements in the stoichiometric matrix

If a file contains a certain column name, there must be no empty entries.

If a model description file is provided, it is supposed to contain at least the entries "name" and "id". Otherwise, the filename of the reactions list will be used (the filename extension and the string _react at the end of the filename will be removed).

The compartments in which a reaction takes place is determined by the compartment flags of the participating metabolites.

All fields in the output files of modelorg2tsv are in double quotes. In order to read them, set argument quoteChar to "\"".

Please read the package vignette for detailed information about input formats and examples.

If a metabolite is used more than once as product or reactand of a particular reaction, it is merged: a + (2) a is converted to (3) a and a warning will be given.

If a metabolite is used first as reactand and then as product of a particular reaction, the reaction is balanced: (2) b + a -> b + c is converted to b + a -> c

A binary version of the stoichiometric matrix \(S\) is constructed via \(\left|S\right| > tol\).

A binary version of the stoichiometric matrix \(S\) is scanned for reactions and metabolites which are not used in S. If there are some, a warning will be given and the corresponding reactions and metabolites will be removed from the model if remUnusedMetReact is set to TRUE.

The binary version of the stoichiometric matrix \(S\) is scanned for metabolites, which are used only once in S. If there are some, at least a warning will be given. If either constrMet or remMet is set to TRUE, the binary version of \(S\) is scanned for paths of singleton metabolites. If constrMet is set to TRUE, reactions containing those metabolites will be constrained to zero; if remMet is set to TRUE, the metabolites and the reactions containing those metabolites will be removed from the network.

In order to find path of singleton metabolites a binary version of the stoichiometric matrix \(S\) is used. Sums of rows gives the vector of metabolite usage, each element is the number of reactions a metabolite participates. A single metabolite (singleton) is a metabolite with a row sum of one. All columns in \(S\) (reactions) containing singleton metabolites will be set to zero. And again, singleton metabolites will be searched until none are found.

The algorithm to find dead end metabolites works in a quite similar way, but not in the binary version of the stroichiometric matrix. Here, metabolite i is considered as dead end, if it is for example produced by reaction j but not used by any other reaction k.

References

The BiGG database http://bigg.ucsd.edu/.

Schellenberger, J., Park, J. O., Conrad, T. C., and Palsson, B. <U+00D8>., (2010) BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics 11, 213.

Becker, S. A., Feist, A. M., Mo, M. L., Hannum, G., Palsson, B. <U+00D8>. and Herrgard, M. J. (2007) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat Protoc 2, 727--738.

Schellenberger, J., Que, R., Fleming, R. M. T., Thiele, I., Orth, J. D., Feist, A. M., Zielinski, D. C., Bordbar, A., Lewis, N. E., Rahmanian, S., Kang, J., Hyduke, D. R. and Palsson, B. <U+00D8>. (2011) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat Protoc 6, 1290--1307.

Examples

Run this code

# NOT RUN {
  ## read example dataset
  mp  <- system.file(package = "sybil", "extdata")
  mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\"")

  ## redirect warnings to a log file
  sink(file = "warn.log")
  mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\"")
  warnings()
  sink()
  unlink("warn.log")  

  ## print no warnings
  suppressWarnings(
    mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\""))

  ## print no messages
  suppressMessages(
    mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\""))

# }
# NOT RUN {
  ## set number of warnings to keep
  options(nwarnings = 1000)
  
  ## redirect every output to a file
  zz <- file("log.Rout", open = "wt")
  sink(zz)
  sink(zz, type = "message")
  mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\"")
  warnings()
  sink(type = "message")
  sink()
  close(zz)  
# }

Run the code above in your browser using DataLab