The function readTSVmod
reads metabolic networks in text files,
following a character-separated value format. Each line should contain one
entry; the default value separator is a tab. Output files from the
BiGG database are compatible.
readTSVmod(prefix, suffix,
reactList, metList = NA, modDesc = NA,
fielddelim = "\t", entrydelim = ", ", extMetFlag = "b",
excludeComments = TRUE,
oneSubSystem = TRUE,
mergeMet = TRUE,
balanceReact = TRUE,
remUnusedMetReact = TRUE,
singletonMet = FALSE,
deadEndMet = FALSE,
remMet = FALSE,
constrMet = FALSE,
tol = SYBIL_SETTINGS("TOLERANCE"),
fpath = SYBIL_SETTINGS("PATH_TO_MODEL"),
def_bnd = SYBIL_SETTINGS("MAXIMUM"),
arrowlength = NULL,
quoteChar = "",
commentChar, ...)
A single character string giving the prefix for three possible input files (see Details below).
A single character string giving the file name extension. If missing, the
value of suffix
depends on the argument fielddelim
, see
Details below.
Default: "tsv"
.
A single character vector giving a file name containing a reaction list.
Only necessary, if argument suffix
is empty.
A single character vector giving a file name containing a metabolite
list.
Default: NA
.
A single character vector giving a file name containing a model
description.
Default: NA
.
A single character string giving the value separator.
Default: "\t"
.
A single character string giving the a separator for values containing
more than one entry.
Default: ", "
.
A single character string giving the identificator for metabolites which
are outside the system boundary. Only necessary, if the model is a closed
one.
Default: "b"
.
A Boolean value. Sometimes, the reaction abbreviations and/or the metabolite
abbreviations contain comments in square brackets. If set to TRUE
,
these comments will be removed. If set to FALSE
, whitespaces included
in comments in metabolite abbreviations will be removed. Comments in
reaction abbreviations stay unchanged. A reaction id with comment is, for
example, the string: pfk [comment]
, with [comment]
being the
comment. There must be at least one whitespace between id and comment,
otherwise it will be considered as compartment flag.
Default: TRUE
.
A Boolean value. Ignore parameter entrydelim
for the field
‘subsystem’, if every reaction belongs to exactly one sub system.
Default: TRUE
.
Boolean: if set to TRUE
, metabolites used more than once as reactand
or product in a particular reaction are added up, see details below. If set
to FALSE
, the last value is used without warning.
Default: TRUE
.
Boolean: if set to TRUE
, metabolites used as reactand and product in
a particular reaction at the same time are balanced, see details below. If
set to FALSE
the last value is used without warning (reactands before
products).
Default: TRUE
.
Boolean: if set to TRUE, metabolites and reactions which are not used in the
stoichiometric matrix will be removed. A metabolite or a reaction is
considered as unused, if the corresponding element of rowSums
(metabolites) or colSums
(reactions) of the binary version of the
stoichiometric matrix is zero, see details below. If set to FALSE
,
only a warning is given.
Default: FALSE
.
Boolean: if set to TRUE, metabolites appearing only once in the
stoichiometric matrix are identified. Metabolites appear only
once, if rowSums
of the binary stoichiometric matrix is one in
the corresponding row, see details below.
Default: FALSE
.
Boolean: if set to TRUE, metabolites which are produced but not consumed, or
vice versa are identified, see details below. If both arguments
singletonMet
and deadEndMet
are set to TRUE
, the
function will first look for singleton metabolites, and exclude them (and
the corresponding reactions) from the search list. Afterwards, dead end
metabolites are searched only in the smaller model.
Default: FALSE
.
Boolean: if set to TRUE, metabolites identified as singleton or dead end
metabolites will be removed from the model. Additionally, reactions
containing such metabolites will be removed also.
Default: FALSE
.
Boolean: if set to TRUE, reactions containing metabolites identified as
singleton or dead end metabolites will be constrained to zero.
Default: FALSE
.
A single numeric value, giving the smallest positive floating point number
unequal to zero, see details below.
Default: SYBIL_SETTINGS("TOLERANCE")
.
A single character string giving the path to a certain directory containing
the model files.
Default: SYBIL_SETTINGS("PATH_TO_MODEL")
.
A single numeric value. Absolute value for uppper and lower bounds for
reaction bounds.
Default: SYBIL_SETTINGS("MAXIMUM")
.
A single numeric or character value or NULL
. This argument controls
the number of "-"
and "="
used in reaction arrows in the
equation strings. If set to NULL
, one or more symbols are used.
The regular expression used is "<?[=-]+>"
.
If numeric, all reaction arrows must consist of exactly arrowlength
signs. The regular expression used is "<?[=-]{arrowlength}>"
.
If character, arrowlength
must be a regular expression and will be
used as "<?[=-]arrowlength>"
. For example, if arrowlength
is
"{1,2}"
the regular expression is "<?[=-]{1,2}>"
, meaning the
reaction arrow can consist of one or two signs. In any case, the completed
regular expression will always used with argument perl = TRUE
.
Default: NULL
.
Set of quoting characters used for the argument quote
in
read.table
, see there for details.
Default: ""
(disable quoting).
A single character used for the argument comment.char
in
read.table
, see there for details. If a comment char is
needed, e.g. “@
” (at) seems to be a good one.
Default: ""
.
Further arguments passed to read.table
, e.g. argument
quote
, comment.char
or argument fill
, if some lines do
not have enough elements. If all fields are in double quotes, for example,
set quote
to "\""
.
An instance of class modelorg
.
A metabolic model consists of three input files:
<prefix>_react.<suffix>
containing all reactions.
<prefix>_met.<suffix>
containing all metabolites.
<prefix>_desc.<suffix>
containing a model description.
All of these files must be character separated value files (for a detailed
format description and examples, see package vignette). The argument
prefix
is the part of the filenames, all three have in common (e.g. if
they where produced by modelorg2tsv
).
Alternatively, the arguments reactList
, metList
and
modDesc
can be used. A file containing all reactions must be there,
everything else is optional.
If suffix
is missing, it is set according to the value of
fielddelim
:
"\t" |
"tsv" |
";" |
"csv" |
"," |
"csv" |
"|" |
"dsv" |
anything else | "dsv" |
The argument ...
is passed to read.table
.
In some cases, it could be necessary, to turn off quoting
quoteChar = ""
(default), if e.g. metabolite names contain quoting
characters "'"
like in 3',5'-bisphosphate nucleotidase
. If all
fields are in quotes (e.g. files generated by modelorg2tsv
), use
quoteChar = "\""
for example.
The input files are read using the function read.table
. The
argument header
is set to TRUE
and the argument sep
is
set to the value of fielddelim
. Everything else can be passed via
the ...
argument.
The header for the reactions list may have the following columns:
"abbreviation" |
a unique reaction id |
"name" |
a reaction name |
"equation" |
the reaction equation |
"reversible" |
TRUE, if the reaction is reversible |
"compartment" |
reaction compartment(s) (currently unused) |
"lowbnd" |
lower bound |
"uppbnd" |
upper bound |
"obj_coef" |
objective coefficient |
"rule" |
gene to reaction association |
"subsystem" |
subsystem of the reaction |
Every entry except for "equation"
is optional. If there are missing
values in field "lowbnd"
, they will be set to -1 * def_bnd
;
if there are missing values in field "uppbnd"
, they will be set to
def_bnd
; if there are missing values in field "obj_coef"
, they
will be set to 0
.
The header for the metabolites list may have the following columns:
"abbreviation" |
a unique metabolite id |
"name" |
a metabolite name |
"compartment" |
metabolite compartment (currently unused) |
If a metabolite list is provided, it is supposed to contain at least the
entries "abbreviation"
and "name"
.
The header for the model description file may have the following columns:
"name" |
a name for the model |
"id" |
a shorter model id |
"description" |
a model description |
"compartment" |
the compartments |
"abbreviation" |
unique compartment abbreviations |
"Nmetabolites" |
number of metabolites |
"Nreactions" |
number of reactions |
"Ngenes" |
number of independend genes |
"Nnnz" |
number of non-zero elements in the stoichiometric matrix |
If a file contains a certain column name, there must be no empty entries.
If a model description file is provided, it is supposed to contain at least
the entries "name"
and "id"
. Otherwise, the filename of the
reactions list will be used (the filename extension and the string
_react
at the end of the filename will be removed).
The compartments in which a reaction takes place is determined by the compartment flags of the participating metabolites.
All fields in the output files of modelorg2tsv
are in double
quotes. In order to read them, set argument quoteChar
to "\""
.
Please read the package vignette for detailed information about input formats and examples.
If a metabolite is used more than once as product or
reactand of a particular reaction, it is merged:
a + (2) a
is converted to (3) a
and a warning will be given.
If a metabolite is used first as reactand and then as
product of a particular reaction, the reaction is
balanced:
(2) b + a -> b + c
is converted to
b + a -> c
A binary version of the stoichiometric matrix \(S\) is constructed via \(\left|S\right| > tol\).
A binary version of the stoichiometric matrix \(S\) is scanned for reactions
and metabolites which are not used in S. If there are some, a warning will be
given and the corresponding reactions and metabolites will be removed from
the model if remUnusedMetReact
is set to TRUE
.
The binary version of the stoichiometric matrix \(S\) is scanned for
metabolites, which are used only once in S. If there are some, at least a
warning will be given. If either constrMet
or remMet
is set to
TRUE
, the binary version of \(S\) is scanned for paths of singleton
metabolites. If constrMet
is set to TRUE
, reactions containing
those metabolites will be constrained to zero; if remMet
is set to
TRUE
, the metabolites and the reactions containing those metabolites
will be removed from the network.
In order to find path of singleton metabolites a binary version of the stoichiometric matrix \(S\) is used. Sums of rows gives the vector of metabolite usage, each element is the number of reactions a metabolite participates. A single metabolite (singleton) is a metabolite with a row sum of one. All columns in \(S\) (reactions) containing singleton metabolites will be set to zero. And again, singleton metabolites will be searched until none are found.
The algorithm to find dead end metabolites works in a quite similar way, but
not in the binary version of the stroichiometric matrix. Here, metabolite
i
is considered as dead end, if it is for example produced by reaction
j
but not used by any other reaction k
.
The BiGG database http://bigg.ucsd.edu/.
Schellenberger, J., Park, J. O., Conrad, T. C., and Palsson, B. <U+00D8>., (2010) BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics 11, 213.
Becker, S. A., Feist, A. M., Mo, M. L., Hannum, G., Palsson, B. <U+00D8>. and Herrgard, M. J. (2007) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat Protoc 2, 727--738.
Schellenberger, J., Que, R., Fleming, R. M. T., Thiele, I., Orth, J. D., Feist, A. M., Zielinski, D. C., Bordbar, A., Lewis, N. E., Rahmanian, S., Kang, J., Hyduke, D. R. and Palsson, B. <U+00D8>. (2011) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat Protoc 6, 1290--1307.
# NOT RUN {
## read example dataset
mp <- system.file(package = "sybil", "extdata")
mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\"")
## redirect warnings to a log file
sink(file = "warn.log")
mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\"")
warnings()
sink()
unlink("warn.log")
## print no warnings
suppressWarnings(
mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\""))
## print no messages
suppressMessages(
mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\""))
# }
# NOT RUN {
## set number of warnings to keep
options(nwarnings = 1000)
## redirect every output to a file
zz <- file("log.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
mod <- readTSVmod(prefix = "Ec_core", fpath = mp, quoteChar = "\"")
warnings()
sink(type = "message")
sink()
close(zz)
# }
Run the code above in your browser using DataLab