captr: R Client for the Captricity API
OCR text and handwritten forms using Captricity. Captricity's big advantage over Abbyy Cloud OCR is that it allows the user to easily specify the position of text-blocks that want to OCR; they have a simple web-based UI. The quality of the OCR can be checked using compare_txt
from recognize.
Installation
To get the latest version on CRAN:
install.packages("captr")
To get the current development version from GitHub:
install.packages("devtools")
devtools::install_github("soodoku/captr", build_vignettes = TRUE)
Using captr
Read the vignette:
vignette("using_captr", package = "captr")
or follow the overview below.
Start by getting an application token and setting it using:
set_token("token")
Then, create a batch using:
create_batch("batch_name")
Once you have created a batch, you need to get the template ID (it tells Captricity what data to pull from where). Captricity requires a template. These templates can be created using the Web UI.
set_template_id("id")
Next, assign the template ID to a batch:
set_batch_template("batch_id", "template_id")
Next, upload image(s) to a batch
upload_image(batch_id="batch_id", path_to_image="image_path")
Next, check whether the batch is ready to be processed:
test_readiness(batch_id="batch_id")
You may also want to find out how much would processing the batch set you back by:
batch_price(batch_id="batch_id")
Once you are ready, submit the batch:
submit_batch(batch_id="batch_id")
Captricity excels in nomenclature confusion. So once a batch is submitted, it is then called a job. The id for the job can be obtained from
the list that is returned from submit_batch
. The field name is related_job_id
.
To track progress of a job, use:
track_progress(job_id ="job_id")
List all forms (instance sets) associated with a job:
list_instance_sets(job_id="job_id")
If you want to download data from a particular form, use the list_instance_sets
to get the form (instance_set) id and run:
get_instance_set(instance_set_id="instance_set_id")
Get csv of all your results from a job:
get_all(job_id="job_id")
License
Scripts are released under the MIT License.
Contributor Code of Conduct
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.