Creating and accessing sources.
SimpleSource(encoding = "",
length = 0,
position = 0,
reader = readPlain,
…,
class)
getSources()
# S3 method for SimpleSource
close(con, …)
# S3 method for SimpleSource
eoi(x)
# S3 method for DataframeSource
getMeta(x)
# S3 method for DataframeSource
getElem(x)
# S3 method for DirSource
getElem(x)
# S3 method for URISource
getElem(x)
# S3 method for VectorSource
getElem(x)
# S3 method for XMLSource
getElem(x)
# S3 method for SimpleSource
length(x)
# S3 method for SimpleSource
open(con, …)
# S3 method for DataframeSource
pGetElem(x)
# S3 method for DirSource
pGetElem(x)
# S3 method for URISource
pGetElem(x)
# S3 method for VectorSource
pGetElem(x)
# S3 method for SimpleSource
reader(x)
# S3 method for SimpleSource
stepNext(x)
A Source
.
A Source
.
a character giving the encoding of the elements delivered by the source.
a non-negative integer denoting the number of elements delivered
by the source. If the length is unknown in advance set it to 0
.
a numeric indicating the current position in the source.
a reader function (generator).
For SimpleSource
tag-value pairs for storing additional
information; not used otherwise.
a character vector giving additional classes to be used for the created source.
For SimpleSource
, an object inheriting from class
,
SimpleSource
, and Source
.
For getSources
, a character vector with sources provided by package
tm.
open
and close
return the opened and closed source,
respectively.
For eoi
, a logical indicating if the end of input of the source is
reached.
For getElem
a named list with the components content
holding the
document and uri
giving a uniform resource identifier (e.g., a file
path or URL; NULL
if not applicable or unavailable). For
pGetElem
a list of such named lists.
For length
, an integer for the number of elements.
For reader
, a function for the default reader.
Sources abstract input locations, like a directory, a connection, or
simply an R vector, in order to acquire content in a uniform way. In packages
which employ the infrastructure provided by package tm, such sources are
represented via the virtual S3 class Source
: such packages then provide
S3 source classes extending the virtual base class (such as
DirSource
provided by package tm itself).
All extension classes must provide implementations for the functions
close
, eoi
, getElem
, length
, open
,
reader
, and stepNext
. For parallel element access the
(optional) function pGetElem
must be provided as well. If
document level metadata is available, the (optional) function getMeta
must be implemented.
The functions open
and close
open and close the source,
respectively. eoi
indicates end of input. getElem
fetches the
element at the current position, whereas pGetElem
retrieves all
elements in parallel at once. The function length
gives the number of
elements. reader
returns a default reader for processing elements.
stepNext
increases the position in the source to acquire the next
element.
The function SimpleSource
provides a simple reference implementation
and can be used when creating custom sources.
DataframeSource
, DirSource
,
URISource
, VectorSource
, and
XMLSource
.