Perform OCR text extraction. This requires you have the tesseract
package.
pdf_ocr_text(
pdf,
pages = NULL,
opw = "",
upw = "",
dpi = 600,
language = "eng",
options = NULL
)pdf_ocr_data(
pdf,
pages = NULL,
opw = "",
upw = "",
dpi = 600,
language = "eng",
options = NULL
)
file path or raw vector with pdf data
which pages of the pdf file to extract
string with owner password to open pdf
string with user password to open pdf
resolution to render image that is passed to pdf_convert.
passed to tesseract to specify the languge of the engine.
passed to tesseract to specify OCR parameters
Other pdftools:
pdftools
,
qpdf
,
rendering