Learn R Programming

TSTr (version 1.2)

SDkeeper: Pre-creates a data.table or a ternary search tree

Description

Pre-calculation step for symmetric delete spelling correction. Creates a data.table or a ternary search tree to store the dictionary symmetrical deletions.

Usage

SDkeeper(input, maxdist, useTST = FALSE)

Arguments

input
a filepath to read from or a character vector containing the strings from which to create the symmetrical deletions.
maxdist
the maximum distance to use for spell checking. The literature on spelling correction claims that around 80% of spelling errors are an edit distance of 1 from the target, and 99% an edit distance of 2. SDkeeper allows to use a distance between 1 and 3.
useTST
specifies if a TST must be used to store the symmetrical deletions. Default is FALSE, an indexed data.table will be used instead (better performance).

Value

An object of class `data.table` or `tstTree` storing the symmetrical deletions of the specified distance.

Details

Generates terms with an edit distance

See Also

SDcheck

Examples

Run this code
fruitTree <- SDkeeper(c("apple", "orange", "lemon"), 2)
fruitTree <- SDkeeper(c("apple", "orange", "lemon"), 1, useTST = TRUE)
SDcheck(fruitTree,"aple")

Run the code above in your browser using DataLab