getPDF

<code>getPDF</code> returns a word-occurrence data.frame from PDF files.
It needs <code>XPDF</code> in order to run (http://www.foolabs.com/xpdf/download.html),
and uses <code>parallel</code> to perform parallel computation.

A set of functions to analyse and compare texts, using classical
text mining	functions, as well as those from theoretical ecology.

Rebaudo Francois

inpdfr

Analyse Text Documents Using Ecological Tools

getPDF function

<dl><dt>myPDFs</dt>
<dd>A character vector containing PDF file names.</dd>
<dt>minword</dt>
<dd>An integer specifying the minimum number of letters per word
into the returned data.frame.</dd>
<dt>maxword</dt>
<dd>An integer to specifying the maximum number of letters per
word into the returned data.frame.</dd>
<dt>minFreqWord</dt>
<dd>An integer specifying the minimum word frequency into the
returned data.frame.</dd>
<dt>pathToPdftotext</dt>
<dd>A character containing an alternative path to XPDF
<code>pdftotext</code> function, see Details section.</dd></dl>

Arguments

Extract text from PDF files and return a word-occurrence data.frame. — getPDF

<dl>

<dt>myPDFs</dt>
<dd>A character vector containing PDF file names.</dd>


<dt>minword</dt>
<dd>An integer specifying the minimum number of letters per word
into the returned data.frame.</dd>


<dt>maxword</dt>
<dd>An integer to specifying the maximum number of letters per
word into the returned data.frame.</dd>


<dt>minFreqWord</dt>
<dd>An integer specifying the minimum word frequency into the
returned data.frame.</dd>


<dt>pathToPdftotext</dt>
<dd>A character containing an alternative path to XPDF
<code>pdftotext</code> function, see Details section.</dd>

</dl>

getPDF: Extract text from PDF files and return a word-occurrence data.frame.

Description

Usage

Value

Arguments

Details

Examples