Learn R Programming

fulltext (version 2.0)

ft_table: Collect metadata and text into a data.frame

Description

Facilitates downstream processing with text mining packages by providing metadata and full text in a tidy data.frame format

Usage

ft_table(path = NULL, type = NULL, encoding = NULL, xml_extract_text = TRUE)

Arguments

path

a directory path, must exist

type

(character) type of files to get. Default is NULL which gets all types. Can be one of pdf, xml, or plain (file extensions: pdf, xml, and txt, respectively)

encoding

(character) encoding, if NULL we get it from getOption("encoding")

xml_extract_text

(logical) for XML, should we extract the text (TRUE) or return a string as XML (FALSE). Default: TRUE

Details

You can alternatively use readtext::readtext() or similar functions to achieve a similar outcome.

Examples

Run this code
# NOT RUN {
if (interactive()) {
## from a directory path
x <- ft_table()
x

## only xml
ft_table(type = "xml")

## only pdf
ft_table(type = "pdf")

## don't pull text out of xml, just give back the xml please
x <- ft_table(xml_extract_text = FALSE)
x
}
# }

Run the code above in your browser using DataLab