convert_tokens

The text of the pdf file. This can be specified directly 
or the pdftools package is used to read the pdf file from a file path. 
To use the pdftools, the path argument must be set to TRUE.

An optional path designation for the location of the pdf to be 
converted to text. The pdftools package is used for this conversion.

path

TRUE/FALSE indicating whether to split the pdf using white 
space. This would be most useful with multicolumn pdf files. 
The split_pdf function attempts to recreate the column layout of the text 
into a single column starting with the left column and proceeding to the 
right.

split_pdf

TRUE/FALSE indicating whether hyphenated words should
be adjusted to combine onto a single line. Default is TRUE.

remove_hyphen

This is a function from the tokenizers package. Default
is the tokenize_words function.

token_function

Includes functions for keyword search of pdf files. There is
also a wrapper that includes searching of all files within a single
directory.

Brandon LeBeau

pdfsearch

Search Tools for PDF Files

convert_tokens function

<dl><dt>x</dt>
<dd>The text of the pdf file. This can be specified directly 
or the pdftools package is used to read the pdf file from a file path. 
To use the pdftools, the path argument must be set to TRUE.</dd>
<dt>path</dt>
<dd>An optional path designation for the location of the pdf to be 
converted to text. The pdftools package is used for this conversion.</dd>
<dt>split_pdf</dt>
<dd>TRUE/FALSE indicating whether to split the pdf using white 
space. This would be most useful with multicolumn pdf files. 
The split_pdf function attempts to recreate the column layout of the text 
into a single column starting with the left column and proceeding to the 
right.</dd>
<dt>remove_hyphen</dt>
<dd>TRUE/FALSE indicating whether hyphenated words should
be adjusted to combine onto a single line. Default is TRUE.</dd>
<dt>token_function</dt>
<dd>This is a function from the tokenizers package. Default
is the tokenize_words function.</dd></dl>

Arguments

Ability to tokenize words. — convert_tokens

<dl>

<dt>x</dt>
<dd>The text of the pdf file. This can be specified directly 
or the pdftools package is used to read the pdf file from a file path. 
To use the pdftools, the path argument must be set to TRUE.</dd>


<dt>path</dt>
<dd>An optional path designation for the location of the pdf to be 
converted to text. The pdftools package is used for this conversion.</dd>


<dt>split_pdf</dt>
<dd>TRUE/FALSE indicating whether to split the pdf using white 
space. This would be most useful with multicolumn pdf files. 
The split_pdf function attempts to recreate the column layout of the text 
into a single column starting with the left column and proceeding to the 
right.</dd>


<dt>remove_hyphen</dt>
<dd>TRUE/FALSE indicating whether hyphenated words should
be adjusted to combine onto a single line. Default is TRUE.</dd>


<dt>token_function</dt>
<dd>This is a function from the tokenizers package. Default
is the tokenize_words function.</dd>

</dl>

convert_tokens: Ability to tokenize words.

Description

Usage

Value

Arguments

Examples