Create or test for text objects.
as_text(x, filter = text_filter(x), ...)
is_text(x)
object to be coerced or tested.
text filter object for the converted text.
further arguments passed to or from other methods.
as_text
attempts to coerce its argument to text
type and
set its text_filter
property; it strips all other attributes
except for names.
is_text
returns TRUE
or FALSE
depending on
whether its argument is of text type or not.
The corpus_text
type is a new data type provided by the
corpus
package suitable for processing Unicode text. Text
vectors behave like character vectors (and can be converted to them
with the as.character
function). They can be created using the
read_ndjson
function or by converting another object
using the as_text
function.
All text objects have a text_filter
property specify
how to transform the text into tokens or segment it into sentences.
The default behavior for as_text
is to proceed as follows:
If x
is a character
vector, then we create
a new text
vector from x
, preserving
names(x)
if they exist.
If x
is a data frame, then we call as_text
on x$text
if a column named "text"
exits in the data frame. We set the names of the result to
the data frame's row names, if they exist. If the data frame
does not have a column named "text"
, then we fail
with an error message.
If x
is a corpus_text
object, then we drop all
attributes from the object except for its names and
filter, and we set the object class to corpus_text
.
The default behavior for when none of the above conditions
are true is to call as.character
on the object first,
and call as_text
on the returned character object.
In all cases, we set the text_filter
property of the result
to the filter
argument given to as_text
.
Note that the special handling for the names of the object is different
from the other R conversion functions (as.numeric
,
as.character
, etc.), which drop the names.
as_text
is generic: you can write methods to handle specific
classes of objects.
# NOT RUN {
as_text("hello, world!")
as_text(c(a="goodnight", b="moon")) # keeps names
is_text("hello") # FALSE, "hello" is character, not text
# }
Run the code above in your browser using DataLab