WARNING: This method is very much in an alpha stage. Expect it to change.
This method is an extension to the default read.table
function in R. It is possible to specify a column name to column class
map such that the column classes are automatically assigned from the
column header in the file.
In addition, it is possible to read any subset of rows. The method is optimized such that only columns and rows that are of interest are parsed and read into R's memory. This minimizes memory usage at the same time as it speeds up the reading.
"readTable"(file, colClasses=NULL, isPatterns=FALSE, defColClass=NA, header=FALSE, skip=0, nrows=-1, rows=NULL, col.names=NULL, check.names=FALSE, path=NULL, ..., stripQuotes=TRUE, method=c("readLines", "intervals"), verbose=FALSE)
connection
or a filename. If a filename, the path
specified by path
is added to the front of the
filename. Unopened files are opened and closed at the end.character
vector
.
If unnamed, it specified the column classes just as used by
read.table
.
If it is a named vector, names(colClasses)
are used to match
the column names read (this requires that header=TRUE
) and
the column classes are set to the corresponding values.
TRUE
, the matching of names(colClasses)
to
the read column names is done by regular expressions matching.colClasses
argument does not match some of the read column
names, the column class is by default set to this class. The
default is to read the columns in an "as is" way.TRUE
, column names are read from the file.rows
is specified.vector
specifying which rows of the table
to read, e.g. row one is the row following the header.
Non-existing rows are ignored. Note that rows are returned in
the same order they are requested and duplicated rows are also
returned.read.table()
.read.table()
, but default value
is FALSE
here.file
is a filename, this path is added to it,
otherwise ignored.read.table
used internally.TRUE
, quotes are stripped from values before
being parse.
This argument is only effective when method=="readLines"
.
"readLines"
, (readLines())
is used
internally to first only read rows of interest, which is then
passed to read.table()
.
If "intervals"
, contigous intervals are first identified in
the rows of interest. These intervals are the read one by one
using read.table()
.
The latter methods is faster and especially more memory efficient
if the intervals are not too many, where as the former is prefered
if many "scattered" rows are to be read.data.frame
.
readTableIndex
().
read.table
.
colClasses
().